key: cord-1041072-scqxc2h4 authors: nan title: Comparative Study of COVID-19 Pandemic Progressions in 175 Regions in Australia, Canada, Italy, Japan, Spain, U.K. and USA Using a Novel Model That Considers Testing Capacity and Deficiency in Confirming Infected Cases date: 2021-06-15 journal: IEEE J Biomed Health Inform DOI: 10.1109/jbhi.2021.3089577 sha: 7e8367b0dd31123b92639c75371b03535ecd7b4b doc_id: 1041072 cord_uid: scqxc2h4 Not identified as being exposed or infected, the group of asymptomatic and presymptomatic patients has become the key source of infectious hosts for the COVID-19 pandemic, triggering the re-emergence of outbreaks. Acknowledging the impacts of movement of unidentified patients and the limited testing capacity on understanding the spread of the virus, an augmented Susceptible-Exposed-Infectious-Confirmed-Recovered (SEICR) model integrating intercity migration data and testing capacity is developed to probe into the number of unidentified COVID-19 infected patients. This model allows evaluation of the effectiveness of active interventions, and more accurate prediction of the pandemic progression in a country, region or city. A pseudo-coevolutionary algorithm is adopted in the model fitting to provide an effective estimation of high-dimensional unknown parameter sets using a limited amount of historical data. The model is applied to 175 regions in Australia, Canada, Italy, Japan, Spain, the UK and USA to estimate the number of unconfirmed cases using limited historical data. Results showed that the actual number of infected cases could be 4.309 times as many as the official confirmed number. By implementing mass COVID-19 testing, the number of infected cases could be reduced by about 50%. T HE Novel Coronavirus Disease 2019 , caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] which has a long evolutionary history in bats [2] , began to spread in China since December 2019 and was found to be highly contagious with a reproductive number as high as 6.49 [3] . Studies have revealed the mean incubation period of COVID-19 being 5 to 14 days [4] , with high transmissibility of the virus from asymptomatic patients [5] - [7] as well as possible transmission from presymptomatic patients [8] , [9] . While infected individuals who have been tested positive and confirmed of being infected are mostly isolated [10] , presymptomatic and asymptomatic patients have formed a large group of individuals who are mobile in the community and able to infect others [11] . In addition, some unidentified infected patients are still travelling [12] - [14] and continue to spread the virus through the transportation networks, restaurants, hotels, business venues, and other venues where they carry out their usual activities and interact with people [15] - [17] . Thus, the group of presymptomatic and asymptomatic patients have become the key source of infectious hosts that may trigger new waves of outbreaks. Up to the end of July 2020, the COVID-19 pandemic has affected 213 countries and caused a total of 17,187,414 infections, with a death toll of 670,202 [18] , and second and third waves of outbreaks have begun to emerge in countries or cities where the pandemic had apparently come under control. The number of confirmed cases is known to represent only a fraction of the infected patients, and any assessment of the extent of an outbreak in a country or city can only be as accurate as the truthfulness of information obtained in relation to the number of exposed, infected and recovered individuals in that country or city. Thus, depending on the number of tests the authorities manage to administer, there will be considerable discrepancy between the number of actual infections and the official confirmed cases [19] . Epidemiological studies indicated that 40% to 70% of the population could become infected unless preventive measures are taken, which include early and prompt testing to identify the infected population and prevent continuous onward transmission of the virus [20] . Healthcare professionals around the globe have advocated for improving testing capacity for SARS-CoV-2 to ascertain infection numbers, estimate reproductive numbers, and evaluate epidemiological risks [21] . More and faster tests generally improve the probability of identifying infected but unconfirmed cases, which can help isolate "invisible" COVID-19 patients and disconnect transmission chains effectively. Different countries are at different stages of pandemic progression. Gearing up for mass testing is the only way to assess the scope of infection and the phase of the progression trajectory in order to inform policy makers to formulate and implement effective public health responses. It is thus important to develop disease transmission models that are able to estimate the number of infected but unconfirmed individuals in a certain population to help contain the spread of the virus [22] . In this work, we attempt to fill the main gap between the number of confirmed cases and the actual number of infected cases. Specifically, in the proposed model, an infected individual may become a confirmed case and then recovered or removed. Moreover, an infected individual may also be recovered or removed without being confirmed as infected. In this work, the basic model proposed is a Susceptible-Exposed-Infectious-Confirmed-Recovered (SEICR) model, which has an additional state corresponding to an individual having been confirmed by the authority as being infected [22] . In addition, the SEICR model, integrated with the daily infection data, intercity migration data and test data, can estimate the size of the unidentified infected population. Also, a specific parameter is included to adjust the level of active intervention in the simulation of progression profiles, which corresponds quantitatively to the increase in the number of individuals eventually infected due to an additional infected individual at any given time. The proposed model can evaluate the effect of active control measures, and can provide realistic prediction for the spread of COVID-19 in a country or city under different scenarios. The resulting augmented SEICR model is a high-dimension nonlinear dynamic model. In this study, a pseudo-coevolutionary algorithm is adopted for model fitting to estimate the set of unknown parameters, which can then be used for generating future progression profiles. The migration-and-test-capacity augmented SEICR model is applied to study the dynamics of the spread of COVID-19 in Australia, Canada, Italy, Japan, Spain, UK and USA. Daily COVID-19 data, intercity migration data, and COVID-19 test data for 175 states or regions (prefectures in the case of Japan) have been collected from official sources [23] - [27] . By fitting the model with data of the COVID-19 infected cases for Australia (January 30 to June 16, 2020), Canada (January 31 to June 17, 2020), Italy (February 4 to June 17, 2020), Japan (January 31 to May 30, 2020), Spain (February 6 to July 5, 2020), the UK (January 30 to June 7, 2020) and the USA (January 29 to June 7, 2020), the corresponding sets of model parameters are computed and then used to estimate the number of infected but unconfirmed cases and evaluate the pandemic progression in the 175 states or regions. Our results revealed that a large percentage of infected patients had not been confirmed, specifically, 75.19% for the USA, 81.48% for Japan, 71.24% for Canada, 78.75% for Australia, 75.36% for Italy, 42.64% for Spain, and 85.17% for the UK. The actual total number of infected individuals could be 4.309 times the official confirmed number. Results also showed that by increasing COVID-19 testing capacity, the number of infected cases could be reduced by half, and specifically, the total number of infected cases would reduce by 4.65% for Australia, 58.63% for Canada, 37.51% for Italy, 52.81% for Japan, 10.88% for Spain, 52.15% for the UK and 57.03% for the USA. The World Health Organization (WHO) announced the COVID-19 outbreak a pandemic on March 12, 2020 and set COVID-19 to the highest alert level [28] . Many government facilities release the pandemic data of COVID-19 to the public [29] . In this study, we have investigated the spread of COVID-19 based on epidemic data, migration data and COVID-19 tests data of 175 states or regions in seven countries. The available datasets include the cumulative number of confirmed cases, recovered cases, and death tolls, for 8 states in Australia from January 24, 2020; 13 states in Canada from January 25, 2020; 22 regions in Italy from January 30, 2020; 47 prefectures in Japan from January 22, 2020; 19 regions in Spain from February 1, 2020; 14 regions in the UK from January 30, 2020; and 52 regions in the USA (50 states, a federal district and Puerto Rico) from January 22, 2020. In general, on day t, the official pandemic data released by region j includes the number of cumulative confirmed cases C j (t), cumulative recovered cases R j,c (t), and confirmed death cases D j,c (t), where subscript c means "confirmed". However, some authorities provide detailed data, while others only release crude information. In these seven countries, each state or region has generally released the detailed data of confirmed cases, recovered cases and death tolls. However, prefectures in Japan have only provided confirmed cases and death tolls. For Canada, some states have released recovered data. For the UK, no recovered data were available from January 22 to May 17, 2020, and no death tolls were available after May 17, 2020. Table I summarizes the available information. We have collected intercity travel data of the seven countries mentioned in Table I . The data contains the indicative number of travellers from one city to another within a country [23] - [27] . Note that these migration data do not include international, intracity or intraregion travel. Based on the intercity migration data, we can construct the migration matrix, which is given as where N is the number of regions in a country, and m ij (t) is the migrant volume from region i to region j at time t. There exists m ij ≥ 0 (i = j) and m ii = 0. Migration matrix M thus effectively describes the network of regions with human movement constituting the links of the network. Fig. 1 shows migration network of the USA, where the width of edge indicates the volume of migration population and arrows stands for direction. Hence, N i=1 m ij (t) represents the volume of population moving in region j from other regions, while N i=1 m ji (t) stands for the volume of individuals moving out of region j to other regions. Then, the dynamic change of population in region j is It is worth noting that the contribution of human migration to the pandemic progression was less significant after travel restrictions and local area lockdowns were implemented, though it played a crucial role in the early spread of the epidemic in China from late December 2019 to mid January 2020 [17] . Data related to testing capacities of the seven countries have been collected since February 2020, and graphically shown in Fig. 2 . In particular, Fig. 2 (b) shows the daily number of tests performed per thousand people (i.e., N T (t) N p × 1000) in each country. The periodic spikes displayed in these data can be attributed to limited test availability during weekends and holidays, and the possible cumulative reporting due to administrative delays. Let N T (t) be the daily number of people tested in a region on day t, and N p be the size of population. From the data collected, the USA has the highest total testing volume, which is 72 times higher than Japan's. We see that the USA, UK and Australia have performed more than 2 tests per thousand people each day until July 25, 2020, whereas Canada, Spain and Italy have conducted about 1 test per thousand people. Japan has the lowest testing capacity among the seven countries under study, and has performed less than 0.1 test per thousand people. Table II summarizes the testing capacities of the seven countries. In the traditional Susceptible-Exposed-Infectious-Recovered (SEIR) model, each individual would assume one of four possible states: susceptible (S), exposed (E), infected (I), and recovered/removed (R) [30] , [31] . In reality, however, an unconfirmed COVID-19 patient (in state I u ) would transit into the confirmed state only if he or she is tested positive. However, not all COVID-19 patients have a chance to be tested, and thus not all infected individuals will become confirmed. Hence, some of the COVID-19 patients would transit to the recovered (removed) state without going through the confirmed state. Based on the traditional SEIR model, we introduce a "confirmed state" C, corresponding to an individual who has been tested positive, giving a new SEICR model [22] . Moreover, acknowledging the influences of intercity migration and testing capacity of each region, we propose here a new augmented SEICR model, in which any individual would assume one of six states at any time, i.e., susceptible (S), exposed (E), unconfirmed infections (I u ), confirmed (C), confirmed recovered/removed (R c ), and unconfirmed recovered/removed (R u ). We also assume no reinfection occurs to the recovered patients. For region j at time t, the number of individuals in each state is denoted by S j (t), E j (t), I j,u (t), C j (t), R j,c (t) and R j,u (t). The state transitions in this model are illustrated in Fig. 3 . The cumulative number of confirmed cases is C(t) + R c (t), and the cumulative number of unconfirmed cases is I u (t) + R u (t). 1) Intercity Migration: Suppose region j has a population P j (t). Let m ij (t) represent the number of people moving from region i to region j at time t. We assume that susceptible individuals are homogeneously mixed in the group of migrating individuals. Then, on day t, the number of susceptible individuals moving from region i to region j is Also, if the number of individuals migrating out of region j is N i=1 m ji (t), the number of susceptible individuals moving out of region j is Similarly, groups of exposed and unconfirmed individuals follow the same migration rules. In addition, all confirmed individuals are assumed to stay in region j, i.e., confirmed cases are isolated. 2) Testing Capacity: In most countries, COVID-19 tests are provided mainly for individuals who have developed symptoms or have been in close contact with infected individuals [32] , [33] . As the testing capacity is limited, only a portion of the unconfirmed cases will be confirmed within a short period of time. Let λ j (N T,j (t)) denote the confirmed rate of infected individuals in region j, which is a function of N T,j (t) (the number of tests performed on day t in region j). Let δC j (t) represent the number of newly confirmed cases on day t (note that δC j (t) = C j (t + 1) − C j (t)). Suppose, on day t, the number of active unconfirmed cases is I u (t). Obviously, larger I u (t) means higher probability of confirming a patient as infected by performing a test. Hence, in region j, the number of active unconfirmed cases I u,j (t) is proportional to δC j (t). Furthermore, a larger N T,j (t) means that a larger group of individuals with symptoms will be tested, resulting in more patients to be identified. Thus, we have, for region j, where λ j (N T,j (t)) is a monotonic increasing function which may be conveniently approximated by a simple linear relationship, i.e., with λ 0,j and k λ,j being constant parameters for region j. Note that the number of newly confirmed cases must not exceed the number of active unconfirmed cases, i.e., λ j (N T,j (t)) ≤ 1. Hence, there exists an upper threshold N T,thr of testing capacity which guarantees all of the unconfirmed cases be identified (confirmed). We may therefore write the rate of confirmation as Then, the number of newly confirmed COVID-19 patients in region j on day t is The role of active intervention is to limit or remove the contact paths connecting an infected individual with the group of susceptible individuals whom the infected individual may be in close contact with. The aim of implementing active intervention is thus to lower the number of effective contact paths, which can be represented quantitatively by k (c) . An illustration is given in Fig. 4 , where active intervention reduces k (c) from 5 to 1. It can be readily conceived that when more people are infected, the number of effective contact paths would increase, and so would the size of the susceptible group. In quantitative terms, the number of eventually infected individuals should increase for each additional infected or exposed individual at time t. This is equivalent to adding an extra term to ΔS i (t) and ΔN s j (t) in the model. This extra intervention term, dependent upon k (c) , can be derived as follows. Suppose, in region j, the average number of close contacts k (c) j of each COVID-19 patient is highly influenced by the containment strategy implemented by the authority. We assume that each confirmed case will be isolated and cease to infect others. Thus, the intervention term should take the following form: where k (c) j corresponds quantitatively to an increase in the number of eventual infected individuals for each additional infected or exposed individual in region j. Note that if strict control measures are adopted, k (c) j (t) will become very small, implying that most contact paths have been disconnected and any newly infected or exposed individual would unlikely infect others. Thus, by setting k Incorporating the intercity migration, testing capacity and active intervention terms, the final form of the augmented SEICR model is given as follows: , and ΔP j (t) = P j (t + 1) − P j (t). The physical meanings of all parameters are given in Table III . In particular, the following parameters, which play special epidemiological or control roles, are worthnoting: r α j represents the rate of infecting a susceptible individual by an exposed individual in region j. Note that the infection rate is influenced by local factors and control measures, such as lockdown, social distancing, contact tracing and quarantine. Hence, α actually varies from region to region and is time-varying. However, in a short period of time, control measures may not change dramatically. Thus, we can assume that α j is constant for region j. Similarly, j } can be assumed constant. r κ represents the rate of an exposed individual transiting into an infected individual. This transition rate is a natural property of COVID-19, which can be considered as a constant rate irrespective of local factors and control measures. r k I represents the possibility of an unconfirmed infected individual moving from one region to another. It can be assumed constant when migration control is in place. r k (c) j represents the effectiveness of the implemented active intervention. Specifically, k c j < 1 corresponds to effective control measures being applied and the spread is being slowed down, whereas k c j > 1 corresponds to less effective control and the spread of the disease being severe. For each additional infected individual, there will be k (c) j more eventual infected individuals. Remarks: The model proposed here highlights the crucial role test capacities play in the progression of the pandemic. The augmented SEICR model differs from most existing models (standard SEIR model and its variants) [34] - [37] in its ability to identify unconfirmed infected cases. The parameter identification, as will be explained in the next subsection, will involve testing data and hence differs significantly from those used for most existing models. Results generated from our model will reflect the influence of test capacities, which is not provided in other models. The model represented by (10) has a large number of unknown parameters to be estimated from historical data through fitting. Specifically, for region j, the set of unknown parameters are The procedure for parameter estimation involves minimizing the distance between generated growth trajectories and corresponding historical trajectories. The parameter estimation problem can be formulated as an optimization problem. The initial values of susceptible, exposed and unconfirmed cases are unknown and to be estimated, while the initial numbers of confirmed and removed cases are equal to the official initial numbers. Here, we define the initial number of susceptible, exposed, infected, and confirmed individuals in region j as Note that N j (t 0 ) = S j,0 + E j,0 + I j,u,0 + C j,0 + R j,c,0 + R j,u,0 and δ j = N j (t 0 )/P j (t 0 ). Then, δ j would not be a parameter to be estimated. Hence, the unknown parameter set is Θ = {X 1,0 , X 2,0 , . . . , X N,0 , θ 1 , θ 2 , . . . , θ N , κ, k I }. The total number of parameters is thus 11N + 2, where N is the number of regions. The SEICR model has a set of extended state variables, i.e., Algorithm 1: Algorithm for Calculating Time Series C j (t i ) and R j,c (t i ). Input: Parameter set and initial number of infected and exposed individuals of each region Θ = {X 1,0 , X 2,0 , · · · , X N,0 , θ 1 , θ 2 , · · · , θ N , κ, k I }, where θ j and X j,0 are defined in equations (11) and (12) . Calculate X(t i + 1) from model (10), namely, Overall, for a total of N regions, we have Model (10) can be written as where f (x) is the right side of (10), and Θ is the set of unknown parameters. Writing ΔX j (t) = X(t i + 1) − X(t i ), we have the following model in discrete-time form: Then, the parameter estimation problem can be formulated as the following constrained nonlinear optimization problem (CNOP) [38] : whereĈ j (t i |Θ) andR j,c (t i |Θ) represent the estimated number of confirmed cases and confirmed removed cases at time t i with parameter set Θ and initial condition X 0 ; w (c) ij and w (u) ij stand for the weighted coefficients, which are constant. The unknown parameter set is bounded between Θ L and Θ U . In this work, an Algorithm 2: Psuedo-Coevolutionary Algorithm for Estimating the Parameter Set Θ * . Input: The set of unknown parameters and initial number of infected and exposed individuals of each region j } and j = 1, 2, · · · N . Output: Optimal parameter set Θ * . Initialization: Initialize temperature T , and take a random starting point. The index of adopted region is Φ = 1, 2, · · · , K. Apply evolutionary algorithm to optimize parameter set to achieveΘ * , i.e., Θ 0 ←Θ * LOOP Process: for i = 0 to M do for j = 1 to N do Set the model parameter as Θ 0 and apply Algorithm 1 to derive Using evaluation criterion (18) to find RMSPE j for each region. end for Find the index of the m largest RMSPE j and set Φ = {a 1 , a 2 , · · · , a m }, where a i represents the index of the ith largest RMSPE j . Apply evolutionary algorithm to optimize parameter set to achieveΘ * , i.e., Θ 0 ←Θ * . return Optimal parameter set Θ * . inverse approach is taken to find the unknown parameters and states by solving (17) . Moreover, the values of the confirmed and recovered cases can be derived directly from the available data, as outlined in Algorithm 1. With 11N + 2 parameters to be found, the optimization procedure associated with parameter estimation is computationally intensive. In addition, the search space in the optimization procedure may be highly nonlinear and may contain many local minima. Evolutionary algorithms have been extensively used in solving such nonlinear optimization problems [39] , [40] . In this work, a new pseudo-coevolutionary algorithm is adopted to solve this problem, as outlined in Algorithm 2. Specifically, the procedure involves two co-adaptive simulated-annealing-based optimization processes: 1) In the main procedure, we tune all the 11N + 2 parameters. Then, the Root Mean Square Percentage Error (RMSPE j ) is utilized to measure the difference between the real values of confirmed cases and the estimated values generated by the model with an optimal parameter set Θ, i.e., 2) In the sub-procedure, we identify the regions (values of j) corresponding to the m largest RMSPE j , and tune the parameters for these regions. It is worth noting that the model, in the form of a set of differential equations, is expected to generate solutions that are the progression profiles. Here, we actually used real data as the solutions to be matched by the model, and the numerical optimization procedure adopted will guarantee that the model does give a solution which resembles the real data when all parameter values are correctly found. Thus, when we get the candidate parameters, it is automatically guaranteed that the parameter values can indeed produce the same data within a small tolerance. In other words, our approach is founded on a rigorous data fitting procedure that generates the parameter values, and we did not "estimate" the parameter values based on some vague rationale or heuristic method. The pandemic progression profiles of 175 regions in the seven selected countries were examined. Data fitting of the model described by (10) was performed using historical daily data of confirmed and recovered cases up to June 17, 2020. For each country, we applied the parameter identification procedures repeatedly to identify 100 suitable candidate sets of parameters that satisfied the fitting criteria. For each set of parameters, we also performed a separate simulation run to generate propagation trajectories. Fig. 5 shows the mean estimated cumulative number of infected cases generated by the sets of suitable candidate parameters and official number of infected cases for 7 selected regions, which were epicenters in the seven countries before June 17, 2020, including New South Wales (Australia), Quebec In applying the parameter identification procedure, we identify 100 suitable candidate sets of parameters that satisfied the fitting criteria, and for each set of parameters, we performed a separate simulation run to generate propagation trajectories and estimated the total number of COVID-19 patients with the 95% confidence interval, as detailed in Table IV for each country. The number of confirmed and unconfirmed cases, and the total number of cases are presented in Fig. 5(h) . Moreover, the percentage of population infected in the seven countries is graphically presented in Fig. 5(i) . Results showed that the USA had the highest number of actual infected cases (7,406,623 cases) which was four-fold as many as the confirmed number, accounting for 2.2567% of its population. In terms of active intervention, all seven countries have an estimated k (c) j of less than 1, based on data collected until June 17, 2020. This suggested that reasonable efforts had been made by these countries in controlling the spread of COVID-19 before June 17, 2020. To illustrate the small variation of parameters found using our parameter estimation method, Fig. 6(a) shows the statistics of α j , β j for selected regions, and 6(b) shows the statistics of k (c) j for the seven countries under study. The mean estimated ratio of unconfirmed to confirmed cases is 3.3092 for the seven selected countries under study. As of July 26, 2020, the worldwide confirmed number of cases has reached 16 million, and using this same ratio, it can be estimated that 69 million (16 × 4.309) people would have actually been infected. Furthermore, for comparison purposes, we estimated the mortality rates for 163 countries and the distribution is shown in Fig. 6(c) , with the mean mortality rate being 3.49% (CI, [2.84, 4.15 ]%). The estimated death toll would thus be 1.8543 million (c.f. the official death toll of 654,327). To assess the effect of enhancing testing capacity, we re-ran the simulation with a 10-fold increase in the testing capacity for each country since February 1, 2020, while keeping other parameters unchanged. Fig. 7 shows a drastic reduction in the number of unidentified cases as well as the total number of cases. Specifically, our model estimated 64.62% (Australia), 24.82% (Canada), 35 .07% (Italy), 44.27% (Japan), 35 .39% (Spain), 53.66% (UK) and 39.90% (USA) of COVID-19 patients being unidentified or unconfirmed. This translates to much reduced ratios of unconfirmed to confirmed cases as well as the total number of infected cases, as given in Table V . The number of confirmed, unconfirmed, and total cases in the seven countries, with a 10-fold increase in testing capacity, are shown in Fig. 7(h) , and the corresponding percentages of population infected are shown in Fig. 7 (i). The total number of infected cases would reduce by 4.65% (Australia), 58.63% (Canada), 37.51% (Italy), 52.81% (Japan), 10.88% (Spain), 52.15% (UK) and 57.03% (USA), and the overall percentage of population infected would reduce to less than 1%. Our results have highlighted an alarming number of unidentified COVID-19 patients who have been allowed to move freely and interact with the susceptible population. The current official or confirmed figures of infected cases represent only 23% of the actual figures (1/(3.309+1)). Authorities could have easily underestimated the scope of the outbreak, and the public could have been ill-informed of the risk of getting infected as official information had omitted the existence of 77% of the transmission chains. Moreover, the countries under study are all developed countries with high testing capacity and advanced medical technology. We can thus reasonably expect that other countries would have an even higher ratio of unconfirmed to confirmed cases. As of July 26, 2020, the official number of infected cases was 16 million, and we may reasonably project that more than 69 million people would have actually been infected. Our study also showed that the level of active intervention applied in the seven developed countries, as reflected by parameter k (c) j , were generally satisfactory up to mid June, 2020. The situation, however, could quickly deteriorate as the transmissibility of the disease remains high due to the high ratio of unidentified patients who are actively infecting others in the community. Our study has pinpointed the importance of increasing the testing capacity, and our results showed that a drastic 10-fold increase in testing capacity since February 2020 could have brought the total number of infected cases down by about 50%, which means over 300,000 lives could have been saved. Besides, with a drastically reduced number of unidentified cases, authorities would get hold of more accurate and realistic information regarding the progression of the pandemic, and hence would be able to formulate and implement effective health responses. In the absence of effective vaccination, mass testing is thus the available practical option for all governments to consider adopting in order to contain the spread of COVID-19 as soon as possible. The main challenge in the modeling and prediction of COVID-19 pandemic progression has been the large number of factors that need to be considered in order to achieve the needed accuracy. Mobility of people, information latency (missing and delayed reporting), country or region specific transmission characteristics, and varying levels of intervention are important factors that affect the rapidity and extent of the spread of the virus. In this work, we propose a new model that takes these factors into consideration. The model considers multiple regions with different sets of parameters (characteristics) in one system, incorporates intercity travel data, distinguishes unconfirmed cases from the official data, allows active intervention to be included as a control parameter, as well as takes into account the effect of testing capacity. The model inevitably has a large parameter set, and requires advanced computational techniques for parameter estimation. In this work, we adopted an evolutionary algorithm to perform parameter extraction, which then permits prediction of future progression profiles. The model and the computational method have been applied to study the spread of COVID-19 pandemic in 175 states or regions in Australia, Canada, Italy, Japan, Spain, the UK and USA. Our results showed that the actual number of infected individuals was 3.309 times (average) as many as the official confirmed figures, with the USA having the highest number of unidentified patients. The actual total number of infected cases could have approached 70 million. Our study has also focused on the impact of increasing the testing capacity, and we showed that the scope of the COVID-19 outbreak would have been halved, had governments provided 10 times more tests since February 2020. As the pandemic continues to progress, a tracking system based on the model and results of this study may be developed for facilitating healthcare professionals to assess the scope of the pandemic and to formulate effective and timely measures to control its progression. Virological assessment of hospitalized patients with COVID-2019 Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic The reproductive number of COVID-19 is higher compared to SARS coronavirus The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: An observational cohort study Asymptomatic transmission, the achilles' heel of current strategies to control COVID-19 Presumed asymptomatic carrier transmission of COVID-19 Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts Antibody tests suggest that coronavirus infections vastly exceed official counts Population flow drives spatio-temporal distribution of COVID-19 in China Nowcasting and forecasting the potential domestic and international spread of the 2019-nCOV outbreak originating in Wuhan, China: A modelling study An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China The effect of human mobility and control measures on the COVID-19 epidemic in China The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak Modelling and prediction of the 2019 coronavirus disease spreading in China incorporating human migration data Coronavirus cases by country, territory, or conveyance Covid-19 mass testing facilities could end the epidemic rapidly COVID-19 epidemic in Switzerland: On the importance of testing, contact tracing and isolation COVID-19: How doctors and healthcare systems are tackling coronavirus worldwide General model for COVID-19 spreading with consideration of intercity migration, insufficient testing and active intervention: Modeling study of pandemic progression in Japan and USA Japan government e-stat Australian bureau of statistics Statistics canada Instituto nacional de estadistica Bureau of transportation statistics WHO declares COVID-19 A pandemic Coronavirus disease outbreak situation Mathematical Tools for Understanding Infectious Disease Dynamics Plausible models for propagation of the SARS virus Priorities for testing patients with suspected COVID-19 infection COVID-19 strategy update 2020 SEIR modeling of the COVID-19 and its dynamics Phase-adjusted estimation of the number of coronavirus disease Spread and dynamics of the COVID-19 epidemic in Italy: Effects of emergency containment measures Seair epidemic spreading model of COVID-19 On the evaluation complexity of constrained nonlinear least-squares and general constrained nonlinear optimization using second-order methods An adaptive estimation of distribution algorithm for multipolicy insurance investment planning A level-based learning swarm optimizer for large-scale optimization