key: cord-0531717-m8bgvjjh authors: Agrawal, Manindra; Kanitkar, Madhuri; Phillip, Deepu; Hajra, Tanima; Singh, Arti; Singh, Avaneesh; Singh, Prabal Pratap; Vidyasagar, Mathukumalli title: SUTRA: A Novel Approach to Modelling Pandemics with Applications to COVID-19 date: 2021-01-22 journal: nan DOI: nan sha: 53b22bed9123c7596039451d06139fa5d82a9ed1 doc_id: 531717 cord_uid: m8bgvjjh Covid-19 pandemic has two characteristics: (i) asymptomatic cases (both detected and undetected) that could result in fresh infections, and (ii) time-varying characteristics due to new variants, NPIs etc. We developed a model SUTRA (Susceptible, Undetected, Tested positive, and Removed, Analysis) for predicting the course of COVID-19. All parameters in the SUTRA model can be robustly estimated purely from data and can be re-estimated at any time in the pandemic. The model also indicates when recalibration is needed. SUTRA can also estimate the number of undetected cases. To the best of our knowledge this is the only model with all these capabilities. The SUTRA approach can be applied at various levels of granularity, from national to district, and to any country with data on daily new cases and recoveries. The approach has been validated on the pandemic in India during all three waves, and also on many other countries. A broad conclusion is that the best way to handle the pandemic is to allow the disease to spread slowly in society, and a"zero-COVID"policy is not sustainable. COVID-19 needs a dynamic model like SUTRA which can help plan logistics and interventions by policy makers. The COVID-19 pandemic caused by the SARS-CoV-2 virus has by now led to more than 500 million cases and more than six million deaths worldwide, as of April 23rd, 2022 [1] . By way of comparison, the infuenza epidemic of 1957 led to 20,000 deaths in the UK and 80,000 deaths in the USA, while the 1968 influenza pandemic led to 30,000 deaths in the UK and 100,000 deaths in the USA [2] . In contrast, the COVID-19 pandemic has already led to more than one million deaths in the USA and more than 170,000 deaths in the UK [2] . Therefore the COVID-19 pandemic is the most deadly since the Spanish Flu pandemic which started in 1918. In the USA, 675,000 people, or 0.64% of the population, died in that pandemic [3] , compared to 0.33% of the population in the current pandemic. Among large economies, the USA, UK, Italy, and Spain, have all registered more than 2,200 deaths per million population [1] . In these countries, the pandemic has gone through multiple "waves." For instance, the USA has witnessed three large peaks and three smaller peaks [1] . In contrast, India has seen three distinct waves, corresponding to the original, Delta, and Omicron variants. To date, India has witnessed 372 deaths per million population [1] . However, because of its large population, in absolute numbers India has registered the second largest number of cases after the USA, and the third highest number of deaths after the USA and Brazil [1] . In order to cope with a health crisis of this magnitude, governments everywhere required accurate projections of the progress of the pandemic, both in space and over time, and at various levels of granularity. In addition, decision-makers also required assessments of the relative effectiveness of different non-pharmaceutical interventions (NPIs) such as lockdowns. Over the past century or so, various epidemiological models have been developed, as reviewed briefly in the next section. All of these models are based on the premise that the disease spreads when an infected person comes into contact with a susceptible person. In the initial SIR model [4] , the population was divided into only three compartments (Susceptible, Infected, and Removed / Recovered). Subsequently an intermediate category of E (Exposed) was introduced between S and I [5] . In this model, interactions between S and E do not lead to fresh infections. However, a distinctive feature of the COVID-19 disease is the presence of a huge number of undetected (as opposed to exposed) persons, who are capable of infecting others through contact, but are not explicitly identified by the health authorities owing to their not showing any symptoms. Thus, without reliable estimates of the numbers of these undetected patients, it is not possible to make accurate predictions about the course of the pandemic. The difficulty of predicting the future course of the pandemic is compounded by the emergence of new variants of concern as well as time-varying strategies adopted by countries to control the spread. Most recently, the Omicron variant was discovered in South Africa in November 2021. In contrast with previous variants, the Omicron variant emerged only after large parts of the world had either been exposed to previous versions of the SARS-CoV-2 virus, or had been vaccinated, or both. Yet, the impact of the Omicron variant varied widely: In some countries, the peak case loads during the Omicron waves were many times higher than the previous peaks, while in other others, peak case loads were the same as, or even lower than, previous peaks. It would be highly desirable to have a model that provides an understanding of the pandemic trajectory in different countries leading to an explanation of differences as above. Against this background, in this paper we have made the following contributions: We created a methodology called SUTRA (Susceptible, Undetected, Tested positive, and Removed, Approach). 1 Aside from the conventional parameters of contact rate β and recovery rate γ, we introduce two new parameters, namely the detection rate , and the reach ρ. All of these parameters can be robustly estimated purely from data on daily new cases and recoveries (or equivalently, daily new cases and active cases) with the help of one external calibration as explained later. Moreover, we explicitly take into account the changes in the pandemic parameters over time, which we refer to as "phase changes." These phase changes can arise due to non-pharmaceutical interventions such as lockdowns, emergence of new variants, "COVID fatigue" leading to non-compliance with COVID-appropriate behaviour, etc. The parameter estimation method proposed by us permits their recomputation at any stage in the pandemic. Most of the previously available parameter estimation methods, even for simple SIR models, applied only at the start of the pandemic, when almost everyone is susceptible; see [5] for example. Further, our methodology identifies when it is necessary to recompute the parameters. To the best of our knowledge, ours is the only modeling methodology that has all these capabilities. For this reason, our methodology is applicable to multiple countries, and at various levels of granularity, from nationwide to district and in-between. Moreover, by estimating changes in various parameters, we are able to quantify the waning of immunity over time or due to the emergence of new variants. The past two years have witnessed different countries adopting different approaches to combating the pandemic. Our methodology has allowed us to compare and contrast these different approaches. At one extreme is the "zero COVID" strategy, in which the government imposes quite drastic policies regarding quarantines, lockdowns, shutting down schools and colleges etc., while in parallel vaccinating the population. In this approach, the authorities are counting solely on vaccine-induced immunity to protect the population from multiple variants of the SARS-CoV-2 virus. Such countries are characterized by high vaccine penetration, but low sero-prevalence. At the other extreme lies an approach that allows the pandemic to run through the population without relying much on vaccination. This approach relies solely on natural immunity (due to prior exposure to the virus) for protection. Countries that adopt this approach are characterized by low vaccine penetration, but high sero-prevalence. In between the two, lies an approach that might be called "controlled spread," in which the pandemic is allowed to work its way through the country in a controlled manner, while also vaccinating the population as quickly as possible. In this approach, the population is protected through a combination of both natural immunity and vaccination. Countries that adopt this approach have both high vaccine penetration and high sero-prevalence. As a part of our studies, we have analyzed thirty six countries, out of which detailed analyses are presented for six countries, and ondensed analyses are presented for the rest: These six countries are: Australia and Singapore. Followed zero COVID approach until recently. South Africa. Relied on natural immunity. India and UK. Followed controlled spread approach. US. Controlled spread approach but comparatively lower levels of vaccination and natural immunity. The efficacy of each approach can be quantified by applying the SUTRA methodology to various countries. Our analysis shows that a "zero COVID strategy" does not succeed in the long run. Eventually the virus breaks through leading to massive increases in the number of cases. The best approach appears to be to permit the pandemic to work its way through the population, while achieving full vaccination, as India and UK have done. More details can be found in the Results and the Discussion sections. By using the SUTRA methodology, we are also able to quantify the waning of immunity over time in thirty six countries, spread over all continents representing more than half the population of the world. Our conclusions are very clear: Natural immunity conferred by prior exposure to any variant of SARS-CoV-2 confers good protection against infection by the Omicron variant, while vaccine-induced protection is bypassed almost totally. This reinforces the point made above, namely that a zero COVID policy is not sustainable in the long run. Perhaps the earliest paper to propose a pandemic model incorporating asymptomatic patients is [6] . In this paper, the population is divided into four categories: S, A (for Asymptomatic), I (for Infected) and R. Interactions between members of S and A, as well as between members of S and I, can lead to fresh infections. In that paper, it is assumed that almost all persons in A escape detection, while almost all persons in I are detected by the health authorities. While the SAIR model of [6] is a good starting point for modelling diseases with asymptomatic patients, it does not directly apply to COVID-19, for the following reasons: Due to contact tracing, some fraction of A do get detected. In fact, detected asymptomatic cases are often comparable to detected symptomatic ones. However, daily reported cases data typically does not provide this division and hence it becomes difficult to estimate number of symptomatic cases even assuming that all symptomatic cases get detected. Therefore, in the present paper we propose a different grouping, namely: S = Susceptible Population, U = Undetected cases in the population, T = Tested Positive, either asymptomatic or symptomatic, and R = Removed, either through recovery or death. This leads to the SUTRA model, where the last A in SUTRA stands for "approach." For discussion purposes, the category R of removed can be further subdivided into R U denoting those who are removed from U , and R T denoting those who are removed from T . As in the conventional SAIR model [6] , interactions between members of S on the and members of U or T , can lead to the person in S getting infected with a certain likelihood. A compartmental diagram of the SUTRA model is shown in Figure 1 . The term βU S represents the assumptions that (i) recent cases have higher chances of getting detected, and (ii) number of new cases do not change dramatically over a few days and so number of detected cases are proportional to βU S, number of most recent cases. A few additional assumptions have been made that do not affect the dynamics significantly, but greatly simplify parameter estimation. Specifically, • It is assumed that the removal rate for both compartments T and U is the same. This can be justified because, due to contact tracing, an overwhelming majority of patients in T are asymptomatic, and those people recover at the same rate as the asymptomatic people in U . Even for the small fraction in T who develop complications and pass away, the time duration is very close to that of those who recover. • There is no interaction shown between the T and S compartments. In most countries, those who test positive (whether symptomatic or not) are either kept in institutional quarantine, or told to self-quarantine. In reality, there might still be a small amount of contact between T and S. However, neglecting this does not significantly change the dynamics of the model, and greatly simplifies the parameter estimation. With these considerations, the governing equations for the SUTRA model are: As is customary, these quantities denote the fraction of the population within each compartment, so that There are four parameters in this model, namely β, γ, , ρ, and a constant of integration c. The interpretation of these parameters is as follows: • β = The likelihood that contact between a susceptible person and an asymptomatic or symptomatic person leads to an infection of the susceptible person; it is called the "contact rate." • γ = Removal rate, including both recoveries and deaths. • = Governs the rate at which infected patients in U move over to T . As shown later, it also equals the ratio T /(U + T ) and is thus called the "detection rate." • ρ = The ratio between the "effective" population P within reach of the pandemic, and the total population P 0 . This last bullet requires a little elaboration, because the symbol ρ does not appear in the equations (1) . Recall that the various symbols in (1) denote the fraction of the population in each compartment. However, the actual data available to the modeller consists of raw numbers of some of these quantities. To arrive at the fractions, one must divide the raw numbers by the effective population of the group under study. This effective population P equals ρP 0 , where P 0 is the total population (for example, about 1.4 billion for India), and ρ is the "reach" of the pandemic. When the pandemic starts, the reach is very small, and increases as the pandemic spreads. Eventually it reaches a value very close to 1. However, as the pandemic progresses still further, and the immunity conferred via prior exposure begins to wane, the reach ρ can go beyond 1, denoting that some persons who were previously exposed and immune to the infection, lose their immunity and are vulnerable to become infected again. Similarly, vaccination of susceptible persons can move them outside the reach, thereby causing a reduction in ρ. This point is brought more clearly when we analyze individual countries. In addition to the four parameters mentioned above, there is also a constant of integration c, which is chosen to ensure continuity of the trajectories when there is a phase change. One of the distinctive features of our approach is a methodology for estimating the values of all the parameters in the pandemic model from reported raw data on the number of daily new cases N T , the total number of active cases T , and the number of daily removals R T (including both recoveries as well as deaths). It is shown in the Supplementary Material that these quantities satisfy the following fundamental relationship: where P 0 denotes the population of the underlying society (e.g., 1.4 billion for India), and In this equation, c is a constant of integration, which is adjusted to ensure that the trajectories are continuous when phase changes occur. In order to apply this fundamental relationship to estimating the parameters, we first discretize the fundamental relationship by averaging all quantities over some fixed number of days ∆. In every country, the data has a very pronounced weekly periodicity, due to various intrinsic factors. Hence we chose ∆ = 7. However, in principle, any integer could be used. Once ∆ is chosen, define vectors as follows In these equations, the index t varies over the duration of the current phase. Then the following linear regression problem is solved: Once the linear regression problem is solved, the quality of the fit parameter, usually denoted by R 2 , is computed, as follows: with the optimal parameter choices. The closer R 2 is to one, the better is the quality of the fit. Once the quantitiesβ,ρ are estimated as above, the following forward projection method is used: The above methodology is implemented in the following manner: The linear regression problem is solved on a daily basis with new data. When the quality of fit R 2 falls below an accepted threshold, denoting a significant change in one or more parameter values, a new "phase" is started. Once the parameters in the SUTRA model are estimated, it is possible to estimate the number of undetected cases, and the fraction of population under the reach of the pandemic, thereby leading to accurate projections about its temporal and spatial evolution. Previous subsection shows how to estimate parameter values for all phases, provided the following two values are known: and c for the first phase. Note that there is no previous phase for which we know the parameter values. We may set c = 0 for the first phase since there is no requirement of continuity from previous phase. The value of for the first phase, say 1 , needs to be provided as an input to the model, and is called calibrating the model. We can calibrate the model in two ways: • A sero-survey at time t provides a good estimate of fraction of population infected until time t − δ, where δ equals the time taken for antibodies to develop. Once we set 1 , the model can compute fraction of infected population at all times. So we can choose a suitable 1 that ensures that model computation matches with the sero-survey result at time t − δ. • When the pandemic has been active long enough in a region without major, long-term restrictions, we may assume that it has reached all sections of society, making ρ close to 1. Again, we can choose 1 that ensures that the reach of the pandemic is close to 1 at suitable time. One more point needs to be made about using serosurveys to calibrate the SUTRA model. Many serosurveys suffer from significant sampling biases. For example, if survey is done using residual sera from a period of high infection numbers, it is likely to significantly overestimate the seroprevalence because a large fraction of uninfected persons would not venture to give blood sample in such a period. In order to minimize sampling biases, therefore, one should use serosurveys done during a period of low infection numbers. Even then, some uninfected people may not participate making the estimates higher than actual. To further reduce bias, one should ideally be able to use multiple serosurveys as well as use the fact that reach is close to 1 by a given time. However, there are regions where neither an accurate sero-survey is available, and it is evident that reach is nowhere close to 1. For such regions, calibration cannot be done with any confidence, and so estimation of all parameter values is not possible. Even for such regions, the model can still compute values ofρ = ρ (1 − c) andβ = β(1 − )(1 − c) for all phases. This ensures that trajectories of T and N T , detected active and new cases respectively, can be simulated well. The real time data for India and other countries was sourced from [7, 1] . We downloaded an extensive list of serosurveys, carried out in various countries, and maintained by the site [8] , eliminated surveys that were not done at national level, or had small sample sizes, or had high risk of bias. Nineteen countries remained after this pruning. These sero-surveys were used together with the SUTRA model to capture the pandemic trajectories in these countries. We also carried out simulations for seventeen more countries based on following criteria: 1. All continents are represented well (4 from Africa, 2 from North America, 4 from South America, 13 from Asia, 12 from Europe, and one from Australia) 2. Populous countries are simulated (except China for which it is not possible to calibrate the model). More than half the world's population lives in these countries. 3. It is likely that reach was close to maximum in these countries at the time of Omicron's arrival, allowing us to calibrate the model. The percentage of population that has natural immunity in a country at the time of onset of the Omicron wave can be determined from the SUTRA simulation of the country after calibration. The percentage of the population in a country that has received at least one dose of vaccination at the onset of Omicron wave is taken from the site [9] . To estimate the extent of hybrid immunity (that is, the percentage of the population that has both vaccine and natural immunity), we assume that the two types of immunity are independent random variables, implying that the fraction with hybrid immunity is the product of vaccine immunity and natural immunity fractions. Without the independence assumption, it is not possible to estimate the extent of hybrid immunity or vaccine-only immunity. A property of the SUTRA model is that the vaccination induced immunity in a susceptible population results in reduction of the reach parameter ρ. This is because a susceptible person becoming immune is equivalent to the person moving outside the reach of the pandemic. Therefore, a gain of vaccine immunity among a fraction f of the susceptible population results in a reduction in ρ by an additive factor of f . Similarly, a loss of vaccine immunity or natural immunity (both can be viewed as people that moved outside the reach and have now come back within the reach) among a fraction g of the population results in an increase in ρ by an additive factor of g. This implies that a change in the value of ρ is due to a change in the spread of the infection in the population, gain of immunity due to vaccination, as well as loss of immunity. Therefore, ρ may not be close to 1 even when the pandemic has spread over entire population. To capture this, we define ρ actual to denote the actual reach of pandemic, so ρ equals ρ actual plus change in immunity levels. The countries chosen have the property that ρ actual is close to maximum at the onset of Omicron wave, and so increase in ρ during Omicron wave can be taken to be mostly due to loss of immunity. This allows us to obtain a good estimate of loss of immunity for the countries. In order to reduce the impact of the calibration value and method chosen, we use the relative increase in reach, defined as 1 − reach before Omicron reach after Omicron , to estimate the immunity loss. The SUTRA methodology was applied to thirty six countries, using publicly available data. Due to space limitations, complete results are presented for six countries, namely (in alphabetical order): Australia, India, Singapore, South Africa, the UK, and the USA. For each country, the total number of phases, duration of each phase, and the parameter values during each phase, are included in the Supplementary Material. Using these estimated values, forward predictions were made regarding the number of daily new cases in each of these six countries. These predicted trajectories were then compared with the actual. These results are presented here, in the main body of the paper. The results of the immunity loss analysis are shown in Table 1 . Figure 8 plots 1 − ρ against vaccination percentage before the arrival of Omicron to ascertain whether reach has maximized in the countries studied. Also, relative increase in the reach ρ, which captures the loss of immunity, is plotted as a function of the percentage of the population having natural immunity before the arrival of the Omicron variant ( Figure 9 ) and vaccination-only immunity ( Figure 10 ). We have presented predicted and actual trajectories for six different countries, spanning the entire duration of the COVID-19 pandemic, starting from 2020 until now. During this period, there emerged several "variants of concern" of the SARS-CoV-2 virus. In India for example, there were three distinct waves, which corresponded to the original virus, the Delta variant, and the Omicron variant. Despite so many varying factors, the fit between the predicted and the actual trajectories was excellent, in all countries, and throughout the entire duraction of the pandemic. Some notable successes were predicting the timing and height of the peak of second wave of India ten days in advance [10] , predicting timing of the peak of third wave in India as well as many states of the country [11] , predicting height and timing of the peak of Delta-wave in UK ten days in advance [12] , and predicting height and timing of the peak of Delta-wave in US more than a month in advance [13] . The predictions for India and its states were extremely useful to the policy-makers in planning the required capacity for providing health care, and scheduling nonpharmaceutical interventions such as school reopenings. The parameter table of a country enables us to quantify the impacts of various events like the arrival of a new mutant, or a lockdown. Moreover, through the reach parameter, we can also explain the somewhat mysterious phenomenon of multiple peaks occurring in rapid succession that was observed in many countries. Below, we discuss in detail the progression of the pandemic in six countries. The calibration for India was done using sero survey done in December 2020 [14] , a period of low infection. Estimates of seropositivity computed by the model were matched with two other serosurveys [15, 16] , and very good agreement was found. Further, our model showed that the reach was close to maximum by December 2021, a very likely scenario. Interestingly, the detection rate stayed almost unchanged at 1/32 throughout the course of the pandemic. The parameter table for India (Supplementary Material) shows the following for each wave. First wave (March to October 2020): The strict lockdown imposed at the send of March 2020 brought down the contact rate β by a factor of two. The reach was very small until May (≈ 0.04) but increased to 0.38 between the end of June and the end of August. Reverse migration of workers, followed by a partial lifting of lockdown, occurred during this period. Second wave (February to July 2021): The arrival of the Delta variant caused the value of β to rise to 0.4 in February 2021. As the variant began to spread in different parts of the country, most states imposed restrictions, which reduced the nationwide β to 0.28 by April. In the same month, ρ increased sharply to 0.83. The removal of all restrictions by August caused β to increase to 0.6. This suggests that the Delta variant was more infectious by a factor of ≈ 2 compared to original variant. Third wave (December 2021 to February 2022): The arrival of the Omicron variant caused β to increase sharply to 1.54 and ρ to increase to 1.05 (from 0.93) by the end of December. In January, mild restrictions were imposed across the country, causing β to drop to 1.22. These were lifted in February, and β went back up to 1.66. Calibration for UK was done using two of the three serosurveys reported in [17] . During the period of serosurveys, the infection numbers were going up and down, which made calibration a little tricky. We used the numbers from the last two surveys as well as the observation that reach has remained stationary from November 2021 (suggesting that ρ actual has been around 1 since then) for calibration. The detection rate started at 1/9.3 and over time increased to almost one in three cases. The parameter table for UK (Supplementary Material) shows the following for each wave. First wave (March to July 2020): The strict lockdown imposed in March 2020 brought down the contact rate β by about 40% in mid-April. However, almost simultaneously, ρ increased three-fold causing another peak. By July, β was back up to 0.24 after removal of restrictions. Second wave (September 2020 to January 2021): This wave was primarily caused by increase in value of ρ from 0.08 to 0.6 (there was a new mutant as well that caused β to increase slightly). It had two peaks: first in October-end when ρ increased to 0.3 and second (a bigger one) in January 2021 when ρ further increased to 0.6. Third wave (February 2021 to October 2021): Although the numbers started increasing in July, the Delta variant appears to have arrived by February as indicated by a jump in value of β from 0.36 to 0.77 by March. This jump did not cause an increase in numbers because reach stayed around 0.6 and more than 85% of population within reach had natural immunity. There were three peaks in quick succession: The first caused by increase in ρ to 0.76 in July, the second caused by further increase in ρ to 0.92 in August ( when β came down to 0.44 during this period, likely caused by precautions taken by people due to high numbers), and the third caused by increase in β to 0.6 in addition to a slight increase in ρ. Fourth wave (November 2021 to April 2022): In November the Omicron variant arrived causing β to increase further. The wave had three peaks (although the second one got a bit messed up due to reporting of very large numbers on 31st January of backlog cases). These peaks were all caused by increase in β -to 0.98 in December, to 1.2 in January-end, and then to 1.37 in March. Calibration for US was done using the serosurvey [18] . The samples were taken from life insurance applications. The calibration was further supported by the fact that ρ has not changed since December, suggesting that ρ actual was close to 1 at the time. The detection rate has slowly decreased from 1/3.5 to 1/4.3 during the course of the pandemic. The parameter table for US (Supplementary Material) shows the following for each wave. First wave (March to August 2020): Restrictions imposed in March and April 2020 brought down the contact rate β by about 40% in mid-April. However, almost simultaneously, ρ increased to 0.04 causing a flat trajectory. In June, ρ increased further to 0.11 causing a peak in July-end. Second wave (September 2020 to February 2021): This wave was caused by an increase in values of both β and ρ. While β increased to 0.33 due to relaxations of restrictions, ρ increased in three steps to cause three successive peaks. Third wave (March 2021 to November 2021): There were two peaks separated by more than four months. The Delta variant appeared to have arrived in March causing β to increase to 0.62. However, it caused only a small peak since ρ more or less stayed unchanged until August. The reach started increasing in August to eventually become 0.7 by mid-September causing another peak. Fourth wave (December 2021 to April 2022): The Omicron variant started spreading in December, but the numbers did not increase much by December-end, since ρ did not change by much. Then the reach increased substantially to 1.08 which led to a very high peak. Calibration for South Africa was done using the serosurvey [19] . The calibration was further supported by the fact that ρ has not changed since November suggesting that ρ actual has been close to 1 since then. The detection rate has remained almost unchanged at 1/17 during the course of the pandemic. The parameter table for South Africa (Supplementary Material) shows the following for each wave. Fourth wave (August 2021 to February 2022): The model detected a sharp rise in β in August to 1.05. It caused an uptick in numbers that did not last long because ρ did not increase. Was this due to onset of Omicron? Possibly. In November, Omicron caused β to further increase to 1.68 and ρ to 1.04 resulting in a sharp peak. The reach has remained stationary since then. The only nationwide serosurvey available for Australia [20] has a wide error band, indicating that the value of lies between 1/18 and 1 in June 2020. This is not useful for calibrating the model. Moreover, the case load until August 2021 was extremely low -due to adopiting a zero-COVID policy -that the model was unable to estimate the parameter values with any certainty. The numbers started increasing from August and the model was successful in capturing the trajectory after that. After some experimentation, we fixed the initial value of = After the South African authorities announced the emergence of a new variant of concern (VOC), later named Omicron, the epidemiology community started analysing the ability of the Omicron variant to bypass immunity provided by vaccination, or prior exposure, or both. Our objective in this paper is to provide a quantitative analysis using the SUTRA model. But before that, we give a brief summary of the vast literature based on laboratory (as opposed to population-level) studies. Everywhere in the world where it was discovered, the Omicron VOC soon replaced all other variants and was responsible for a massive increase in cases. This was due to high transmissibility conferred by the mutation, ensuring a tight binding to the ACE 2 receptor facilitating immune escape [22] . The immune escape phenomenon was reported by many groups studying the neutralization activity of sera from both infected and vaccinated individuals; see [23, 24, 25, 26, 27] . The immunity conferred by complete vaccination decreased from 80% for the Delta variant to about 30% for the Omicron variant. People infected with the Delta were better off than those infected with the initial Beta variant. There was a complete loss of neutralizing antibodies in over 50% of the vaccinated individuals and the decrease in titres varied from 43-122 fold between vaccines [28] . A booster Pfizer dose could generate an anti-Omicron neutralizing response, but titres were 6-23 fold lower than those for Delta variant. Sera from vaccinated individual of the Pfizer or Astra Zeneca vaccine barely inhibited the Omicron variant five months after complete vaccination [29] . In addition, Omicron was completely or partially resistant to neutralization by all monoclonal antibodies tested [22] . Overall, most studies confirmed that sera from convalescent as well as fully vaccinated individuals irrespective of the vaccine (BNT162b2, mRNA-1273, Ad26.COV2.5 or ChAdOx1-nCoV19, Sputnik V or BBIBP-CorV) contained very low to undetectable levels of nAbs against Omicron. A booster with a third dose of mRNA vaccine appeared to restore neutralizing activity but the duration over which this effect may last has not been confirmed. Double vaccination followed by Delta breakthrough infection, or prior infection followed by mRNA vaccine double vaccination, appear to generate increased protective levels of neutralizing antibodies [30] . Viral escape from neutralising antibodies can facilitate breakthrough infections in vaccinated and convalescent individuals; however, pre-existing cellular and innate immunity could protect from severe disease [30, 31] . Mutations in Omicron can knock out or substantially reduce neutralization by most of the large panel of potent monoclonal antibodies and antibodies under commercial development. Studies also showed that neutralizing antibody titers against BA.2 were similar to those against the BA.1 variant. A third dose of the vaccine was needed for induction of consistent neutralizing antibody titers against either the BA.1 or BA.2.3,4 variants, suggesting a substantial degree of cross-reactive natural immunity [32] . Now we present our own analysis. One of the criteria we used for shortlisting countries was that ρ actual should be close to maximum before the arrival of Omicron. We verify his condition by plotting 1 − ρ against the percentage of susceptible population that was vaccinated before arrival of the Omicron variant. Since vaccination of the susceptible population reduces reach, the two quantities would be proportional if ρ actual was close to 1. Indeed, Figure 8 shows a very strong correlation between the two, providing more evidence that the increase in reach after arrival of the Omicron variant in these countries is primarily due to a loss of immunity. Figure 10 shows a strong correlation between levels of vaccine-only immunity and loss of immunity during Omicron wave. In contrast, Figure 9 shows a strong inverse correlation between levels of natural immunity and loss of immunity. The R 2 values of the straight-line fits in these figures are quite close to one, considering that the data sources are inherently noisy. Taken together, these plots show that Omicron bypassed vaccine-only immunity almost completely, but natural immunity provided very good protection. A clear conclusion is that countries following zero-COVID strategy -strictly control the spread and vaccinate entire population -suffered maximum during Omicron wave. Indeed, a perusal of Table 1 shows that in countries where the reach was very low before the arrival of the Omicron variant saw very large percentage increases in reach thereafter. Hence we conclude that a zero-COVID strategy does not work in the long run, while the controlled spread strategy is far superior. The country-wise case load history also supports this conclusion. The work of MA, DP, TH, Arti S, Avaneesh S, & Prabal S was supported by grants from CII and Infosys Foundation, and MV was supported by the Science and Engineering Research Board, India. We first construct a model for the evolution of the pandemic. It is a mean-field model, that analyzes the average values of various quantities at a societal level. This approach is in contrast to agent-based models that adopt a much more fine-grained analysis. The members of the group under study are divided into five nonoverlapping compartments, namely S (Susceptible), U (Undetected), T (Tested positive), R U (Recovered from the U compartment), and R T (Recovered from the T compartment). As is the convention, we use letters S, U , T , R U , and R T to these symbols denote the fraction of population in the corresponding compartment. At present, in most countries, persons in the group T (whether symptomatic or not) are mostly quarantined, and it can be assumed that for the most part, they do not come into contact with the susceptible population S. Therefore persons in group S get infected mostly through contact with group U of undetected infected patients, with a likelihood of β. Finally, it is obvious that all infected persons are initially undetected, and thus enter group U . In turn some part of U , call it where γ U is the rate of removal from U , and γ T is the rate of removal from T . Although removal rates for symptomatic (= γ I ) and asymptomatic (= γ A ) groups are reported to differ by a few days [33] , removal rates for U (= γ U ) and T (= γ T ) groups are much closer: Lemma 1. Let f be the fraction of symptomatic cases in T and g be the fraction of symptomatic cases in U . Then, We can assume that f > g since symptomatic patients are more likely to get tested. In case f and g do not differ by much (in US [34] , f − g < 1/8) or f is significantly smaller than 1 (in India [35] , f < 1/10), γ T ≈ γ U since g I − g A is not more than five days [33, 36] . Above justifies the following simplification: The equations then simplify to: Thus the model formulation is complete once we specify N T , the fraction tested positive at time t. One possibility is to assume that everyone in U is equally likely to get detected, so that N T = δU , for some δ > 0. This would lead tȯ This is the conventional approach, used in earlier models like SAIR and SEIR. However, detection of COVID-19 cases is biased towards more recently infected due to contact tracing. Divide T into two subgroups: T C containing those that are detected through contact tracing and T N C are the rest. Lemma 2. Suppose an infected person infects R 0 other persons. Further, suppose that upon detection of a positive case, all who came in contact with the infected person are also tested. Then, the expected number of days a person in T stays in U is R 0 +2 2(R 0 +1) k, where k is the average number of days a person in T N C stays in U . Proof. Observe that |T C | = 1 R 0 +1 |T | since for every initiating case, contact tracing will find R 0 additional cases on average. The cases detected via contact tracing would be infected after the initiating case, and therefore, between 1 and k − 1 days ago. These are expected to be uniformly distributed in the range [1, k − 1] and therefore, the expected number of days a person in T stays in U equals: The mean duration for which a symptomatic case remained infected was estimated to be around 13.4 in [33] . However, the movement from T to R T is likely to happen more quickly, since a person may stop infecting others while still being RTPCR positive. (This number is lestimated to be less than 10 days for symptomatic cases in [36] ). The average duration of stay in U for a person in T C is likely to be less than half of above duration, so k is less than 5. For Covid-19, the values of R 0 has been estimated to be in the range [3, 6] depending on the mutant. Therefore, the expected number of days a person in T stays in U is less than 4. The above analysis shows that the cases that move from U to T are significantly biased towards recently infected ones, and therefore, it is better to set N T to be proportional to a fraction of recently infected cases. This number can be taken to be proportional to βSU , the fraction of persons who got infected most recently, as the number of cases do not change significantly over a window of few days. Therefore we choose N T = βSU , with being another parameter of the model. With this assumption, the full SUTRA model becomeṡ S U T R βSU N T = βSU γT γU Figure 11 : Flowchart of the SUTRA model A compartmental diagram depicting the SUTRA model is given in Figure 11 . The acronym SUTRA stands for Susceptible, Undetected, Tested (positive), and Removed (recovered or dead) Approach. Susceptible, The word Sutra also means an aphorism. Sutras are a genre of ancient and medieval Hindu texts, and depict a code strung together by a genre. It is possible to introduce another parameter D denoting deaths, and write it asḊ = ηT . However, it is quite easy to estimate η as the ratio between the incremental death totals and the increase in cumulative positive test cases. Hence that relationship is not shown as a part of the SUTRA model. Defining M = U + T , R = R U + R T , we get from equations (6) and (7) thaṫ resulting in for an appropriate constant of integration c. Adding equations (6) giveṡ resulting in for some constant d. Since e −γt is a decaying exponential, it follows that, except for an initial transient period, the relationship M = (1/ )I holds. This in turn implies that How long is the transient period? Observe that the constant d equals M (0) − (1/ )T (0) which is close to zero since fraction of infected cases at the start of pandemic is very small. Therefore the transient period will not last more than a few days. As we will see later, such transient periods will recur at various stages of pandemic and all of them remain small. These simplifications allow us to rewrite equation (8) as: Rearrange (12) as The progression of a pandemic is typically reported via two daily statistics: The number of people who test positive, and the number of people who are removed (including both recoveries and deaths). However, there is a difficulty with the second statistic: there is no consensus on when to classify an infected person as removed. Some do it when RTPCR test is negative, some do it when symptoms are absent after a predefined period, and some others do it after a fixed period of time irrespective of symptoms. For the purpose of modeling, a person should be classified as recovered at the time when that person is no longer capable of infecting others. This is difficult to ascertain, for which reason this criterion is almost never used. Further, some countries (UK for example) do not report this statistic at all. In such a situation, we do not rely on reported data, and instead compute R T by fixing γ to an appropriate value as discussed in section on parameter estimation. Let T (t) denote the number of detected cases who are capable of infecting others on day t, R T (t) denote the number of detected cases that are removed on or before day t, and N T (t) denote the number of cases detected on day t. Note that, in this notation, all three are integers, and t is also a discrete counter. In contrast, in the SUTRA model, T , R T and N T are fractions in [0, 1], while t is a continuum. In order to infer these fractions from the case numbers, we observe that where P is the effective population that is potentially affected by the pandemic. Now we introduce the parameter measuring the spread of the pandemic. We define a number ρ, called the "reach," which equals P/P 0 , where P is the effective population and P 0 is the total population of the group under study, e.g., the entire country, or an individual state, or a district (this parameter is also introduced and studied in [37] ). The reach parameter ρ is usually nondecreasing, starts at 0, and increases towards 1 over time (situations where it decreases are discussed later). While the underlying population P 0 is known, the reach ρ is not known and must be inferred from the data. Substituting P = ρP 0 , (13) gives a relationship that involves only measurable and computable quantities T , R T and N T , and the parameters of the model, namely Eq. (14) is the fundamental equation governing the pandemic. It establishes a relationship between N T , T , and (T + R T )T , first of which is directly measurable and the remaining two are computable once γ is fixed. Finally, we integrate (14) over seven days because reported daily case numbers usually have a weekly periodicity to them. The integration will be a summation since T , R T and N T are available at only discrete time instants. This gives Note that sum for N T is shifted forward by one day since new infections reported on day t + 1 are determined by active infections and susceptible population on day t. The parameters ρ, β and are not constant, and vary over time. In the case of the reach parameter ρ, it increases in spurts, for example when the pandemic hits a new region. In the case of the contact rate β, it changes for following reasons: • Emergence of new and more infectious variants of the virus, which would spread faster than its predecessor. It takes time for the new variant to overtake whatever existed previously, which is why this factor would cause β to increase over a period. • Non-compliance with COVID guidelines. The β parameter measures the likelihood of infection when an infected person (from either U or T ) meets a susceptible person from S. Thus β increases if people do not wear masks, or fail to maintain social distancing, and the like. • The parameter can also decrease suddenly, with almost a step change, due to non-pharmaceutical interventions such as lockdowns. Finally, the parameter, which equals the ratio T /(U + T ), can increase due to more comprehensive testing. The presumption is that more testing will increasing T without increasing the total pool U + T . The changes in parameter values occur either as a slow drift over extended period of time, or as sudden rise and fall. We divide the entire timeline of the pandemic into phases, such that within each phase, the parameters are (nearly) constant. A phase change occurs when one of the parameter values changes significantly. It could be due to a quick change for reasons listed above, or accumulated slow change over an extended period. By convention, we include the duration of change in a parameter as part of new phase and call it drift period of the phase. The remaining duration of a phase is called stable period of the phase. When the value of changes, then the relationship T = M breaks down for certain period. When stabilizes to its new value, T converges to M after some time. How long will this duration be? Following lemma helps us estimate it. Lemma 3. Suppose a new phase begins at time t 0 with a drift period of t 1 − t 0 . Then, Proof. Integrating equation (8) over the drift period, we get: Therefore, where the second inequality uses the assumption that T (t − 1)e −γ ≤ T (t). It has been observed that the value of does not change significantly from one phase to next (due to testing strategy not changing in a major way over a short period), and active cases almost never decline by more than 10% in one day, and so T (t) ≥ T (t − 1)/1.1 > T (t − 1)e −γ . Therefore, | (t 1 ) − (t 0 )|/ (t 0 ) will be significantly smaller than one, which implies that M ≈ 1 T already by the end of drift period. Daily values of N T over a given time period define detected trajectory. Following algorithm computes parameter values for different phases given N T over a time period. Input: N T (t), for 1 ≤ t ≤ t e , T (0), R T (0), P 0 , and t p /* t e : last date for which data is available * P 0 : population, t p : last date for simulation */ 1. Fix γ = 0.1; 2. Set t = 1; 3. while t < t e do the following: (a) Find drift and stable periods of the phase starting at day t; (b) Compute valuesβ andρ for the phase; (c) Increase t to day after the end of the phase; 4. Compute and output the detected trajectory of the pandemic until t p ; In the following subsections we provide details of the algorithm along with necessary justifications. As discussed in the previous section, reported removal data does not provide a good estimate for γ. In [33] , median duration of infection for asymptomatic cases was estimated in the range [6.5, 9.5] and mean duration for symptomatic cases in the range [10.9, 15.8] days with a caveat that the duration reduces when children are included. In [36] , infection duration for symptomatic cases is observed to be less than 10 days. Since our groups U and T consist of a mix of asymptomatic and symptomatic cases, and it is likely that an infected person stops infecting others before becoming RTPCR negative, we take the mean duration of infection for both groups to be 10 days. This implies γ = 0.1. All our simulations are done using the above value of γ and show a good fit with the actual trajectories. We use equation (15) for computing the duration of current phase, its drift period, and associated parameter valuesβ and˜ . Consider first m days from the starting day of the current phase. From the input data N T and using the value γ = 0.1, we can compute T and R T values for the period. Represent values of t t−∆ T (s)ds, t+1 t−∆+1 N T (s)ds, and t t−∆ (T (s) + R T (s))T (s)ds (∆ = 7) for this period by mdimensional vectors u, v, and w respectively. Equation (15) can be rewritten for the period as: The values ofβ andρ can be estimated using standard linear regression. If R 2 -value of the estimate is not very high, it indicates that either a phase change has occurred within m days or drift period of the current phase is bigger than m. In either case, m needs to be changed. Repeat this until a high R 2 -value (≥ 0.98) is obtained. This gives a good estimate of parameter values of the phase. The phase may extend beyond m days though; so to detect phase boundary, we increase m until R 2 -value of the fit reduces. This algorithm is captured below. Input: N T (t), for t 0 ≤ t ≤ t e , T (0), R T (0), P 0 /* t 0 : starting date of the phase * t e : last date for which data is available, P 0 : population * r is a predefined threshold for R 2 value */ 1. Set R 2 = 0.0; m = 10; 2. while R 2 < r do the following: (a) Compute vectors u, v and w from data N T (t 0 ), . . ., N T (t 0 + m); (b) Use linear regression to computeβ andρ, and its R 2 -value; (c) If R 2 < r, set m = m + 1; (d) If t 0 + m > t e , exit with error; 3. Increase m and repeat until R 2 -value becomes < r or t 0 + m = t e ; The standard way is to find values forβ and˜ that maximize the R 2 -value given by where | · | denotes the Euclidean norm of a vector. When there are relatively few data points in a phase (which happens when the duration of the phase is short), or the data has significant errors, standard linear regression method fails to work at times (estimated parameter value becomes negative). In such situations we use a different method for estimation that is more tolerant to errors as described below. Let Find values ofβ > 0 and˜ > 0 that maximize the product R 2 = R 2 β · R 2 . This choice ensures that bothβ and˜ play almost equally significant roles in minimizing the error. Further, the desired maximum of R 2 β R 2 is guaranteed to exist: Lemma 4. When u is independent of v as well as w, there is a maxima of R 2 with R 2 β , R 2 ,β,ρ > 0. Proof. Let x = 1 β and y = 1 ρ . Then we have: (16) is always positive since u is independent of v as well as w. The numerator is a product of four linear terms in the unknowns x and y. Therefore the value of R 2 is positive inside the polygon defined by: and is zero on the boundaries. This guarantees that there exists at least one maxima inside the polygon. The only situation when the above method will not yield the desired maxima of R 2 is when u is dependent on either v or w. Former implies that T is proportional to N T for m days, or equivalently, S does not change over the period. This implies N = 0 = N T = T for the period. Similarly, latter implies that T is proportional to (T + R T )T for m days, or equivalently, T + R T does not change over the period. This also implies that N T = 0 = N . Either case occurs when the pandemic has effectively ended and there are no new cases for an extended period. The uncertainty in the parameter estimation is computed using the standard mean-square error formula for linear regression. We use it to compute 95% confidence interval ranges forβ andρ values. While the algorithm above detects phase boundaries reasonably well, it can be improved with manual intervention, at times significantly. Such intervention is also required to identify drift period of the phase since a point could also appear in drift period due to error in data. In this subsection, we show how to do this through a few examples for India. To visualize how well is equation (15) satisfied, we plot points for t 0 ≤ t < t 0 + m for estimated value ofβ. It is straightforward to see that (15) holds if and only if above points lie on a straight line passing through the origin with a slope of 1/ρ. This is illustrated by plotting the data for India between March 19 to May 19, 2020, which was the start of the pandemic in India. The plot in Figure 12 shows these points with regression provided values ofβ ≈ 0.18, 1 ρ ≈ 3918.4. It is clear that the points in the beginning lie on a very different line than last seventeen ones, indicating a phase change in April-end. Indeed, the simulation ( Figure 13 ) is way off the actual trajectory. Removing points after May 2 gives valuesβ ≈ 0.22, and 1 ρ ≈ 19074.6. Now the fit is better, but the points show a slow drift (Figure 14) . The simulation has improved, but is still not fitting well due to the drift (Figure 15 ). Removing twenty more points reduces the drift (Figure 16 ), withβ ≈ 0.33, 1 ρ ≈ 99741.5. R 2 value of the fit is greater than 0.999. The simulation shows an excellent fit ( Figure 17 ). We start a new phase from April 12th and plot the points up to June 30th. Last ten points show a clear deviation from the trajectory of the previous points in Figure 18 (β ≈ 0.14, 1 ρ ≈ 348.2), and simulation confirms the incorrect identification of phase in Figure 19 . It can be improved further by ignoring the initial few days as drift period and using the remaining points for estimating the values of parameters. Removing first four days ( Figure 22) givesβ ≈ 0.16, 1 ρ ≈ 975.8, R 2 > 0.999, and a better simulation ( Figure 23 ). In the plot, points in drift period are colored red. Note that the points oscillate around the line initially which is likely due to errors in reported data. The above two examples were for the situation when all data points in a phase are available, which implies that the prediction of trajectory is for the past. What about the future? We can predict the future as long as the current phase continues. The parameter estimation for the current phase will not be as precise as for past phases since full phase data is not available. In fact, if the current phase is in drift period, parameter estimation can be significantly off. However, one can detect if the phase is in drift period or stable period by observing the point plot P(t) for the current phase, which allows one to infer if the prediction is accurate or not. For example, trajectory for India was in the drift phase during the first half of April. This caused the model estimation to be significantly off: in a tweet on 14th April [38] , we predicted a peak at around 190K infections during April 20-25. The phase plot shows a clear drift (Figure 24 ) withβ ≈ 0.35 and 1 ρ ≈ 55.9. It stabilized by 23rd April leading to much better prediction by 29th April. The stability is clearly visible in the phase plot ( Figure 25 ). The estimated parameter values areβ ≈ 0.32 and 1 ρ ≈ 43.4. While addition of next fifteen data points made the predictions more accurate, the projection using only six data points of stable period was already reasonably accurate. The above calculations give us values of parametersβ andρ post the drift period of every phase. However, in order to simulate the course of the pandemic, it is necessary to have the values of the parameters during the drift period as well. Suppose d is the number of days in drift period, and b 0 and b 1 are the computed values of a parameter in the previous and the current phases. Then its value will move from b 0 to b 1 during the drift period. A natural way of fixing its value during the period is to use either arithmetic or geometric progression. That is, on ith day in the drift period the value is set Among these, geometric progression captures the way parameters change better: • When a new, more infectious, mutant spreads in a population, its infections grow exponentially initially. This corresponds to a multiplicative increase in β. • Similarly, a new virus spreads in a region exponentially at the beginning. This corresponds to a multiplicative increase in ρ. • A lockdown typically restricts movement sharply causing a multiplicative decrease in β. • A change in testing strategy typically gets implement fast in a region, causing a multiplicative change in . For these reasons, we assume that changes in parameters β, ρ, and are multiplicative. Further, terms 1 − and 1 − c do not change much during drift period since, as has been observed in actual simulations, both and c remain close to 0. Therefore, multiplicative changes in and c almost coincide with multiplicative changes in 1 − and 1 − c, and so we can assume that these also change geometrically. This leads to the nice conclusion that changes inβ andρ are also multiplicative. Once we have duration, drift periods, and parameter values for all phases, the trajectory of the pandemic can be computed easily. • P 0 is population of the region, • D i and d i are respectively duration and drift period of ith phase, •β i andρ i are respectively estimated values of parameters for ith phase, the trajectory of detected cases can be computed for the period k i=1 D i . Proof. Proof is by induction on t. Base case of t = 0 is given as input. Suppose T (t), and R T (t) are computed. Then, from (14) : And whereβ t andρ t are values of parameters on day t. Detected trajectory thus is captured by the (4k + 4)-tuple This shows that detected trajectory can be specified much more compactly than via k i=1 D i points N T (t). Similar to quantities associated with detected trajectory, define U = ρP 0 U , M = U +T , R = ρP 0 R, and N = ρP 0 N . The actual trajectory of the pandemic is given by the daily values of N (t). There is no way to measure it directly. Is it possible to compute this trajectory given a detected trajectory? It appears unlikely since detected trajectory only providesβ andρ values while actual trajectory is determined by four parameter β, ρ, and c. Givenβ andρ, one needs two more parameters and c for each phase to compute the entire trajectory: And R(t + 1) = R(t) + γM(t) whereβ t ,ρ t , t and c t are values of parameters on day t. Potentially, there may be infinitely many different actual trajectories for a given detected trajectory. We now prove that, given a detected trajectory and initial values M(0) and R(0), there are only finitely many actual trajectories. Further, we give a canonical way to identify a unique trajectory from them and show how to compute it. (9) and (11): giving us excellent approximations of 1 and c 1 ρ 1 . From this, we can compute giving values of all parameters for first phase, using which the trajectory can be computed for the first phase uniquely. Suppose there are at most 2 2 i−1 j=1 d j trajectories up to phase i − 1. Fix any one trajectory with values of four parameters in phase i − 1 being β i−1 , ρ i−1 , i−1 and c i−1 . Let t 0 = i−1 j=1 D j . We have: Phase i has a drift period of d i days and, as we have argued, the parameter values change multiplicatively during the period. Let i,j = i−1 x j , and 1 − c i,j = (1 − c i−1 )/y j for 1 ≤ j ≤ d i , where x and y are unknown multipliers by which the two parameters change every day. The final value of the parameters will be i = i−1 These numbers can be computed sinceβ i−1 ,β i ,ρ i−1 , andρ i are known. Then we can write: Therefore, both M(t 0 + j) and R(t 0 + j) are polynomials in x and y. It is straightforward to show that the degrees of M(t 0 + j) and R(t 0 + j) equal 2 j − j − 1 and 2 j−1 − j − 2 respectively. At the end of drift period, we have: For t 0 + d i < t ≤ t 0 + D i , it inductively follows that: Therefore, once the relationship is satisfied between actual and detected cases after drift period, it remains valid for rest of the phase. For what values of unknown multipliers x and y is the relationship satified? Equation (18) provides two polynomials in x and y: with Q M (x, y) of degree 2 d i −1 and Q R (x, y) of degree 2 d i −1 −2, such that the actual multipliers are their common roots. Lemma 7 shows that the number of common roots of these two polynomials that correspond to a valid solution is at most 2 4d i . Therefore, the number of possible trajectories after i phases is bounded by 2 2 i j=1 d j . we observe that now g j is a polynomial in X and Y which makes M(t 0 + j) and R(t 0 + j) also polynomials in X, and Y with same degrees as before. The polynomials Q M and Q R now are: Y, a, b) and Q R (X, Y, a, b) will be close to zero for all |a|, |b| 1. There may not be a point close to (x 0 , y 0 ) on which Q M (X, Y, a, b) = 0 = Q R (X, Y, a, b) due to errors in estimation ofβ andρ for phase i. However, there will always be a point close to is minimum and very close to zero. Therefore, we conclude that a small perturbation of a valid solution of Q M (x, y) = 0 = Q R (x, y) will be a minimum of Q(X, Y, a, b) for any given |a|, |b| 1. The minimum is a common zero of Define, and Although the upper bound on the number is trajectories is quite large, in practice, most of the common roots of the polynomials Q M and Q R would not satisfy one or more of following conditions that should be satisfied by any actual trajectory: 0 < β < 1, 0 < < 1, 0 < ρ < 2, −1 < c < 1. Therefore, most can be eliminated. Further, the value of parameter is unlikely to change significantly from one phase to next since testing strategies do not change dramatically in a short period. This leads to following definition: Given an actual trajectory up to phase i − 1, its canonical extension to phase i is the actual trajectory for phase i such that | i − i−1 | is minimum. The canonical actual trajectory is the trajectory obtained by taking the unique trajectory of phase 1 and canonically extending it for subsequent phases. In practice, we have observed that for most of the phases, there is only one actual trajectory corresponding to the computed detected trajectory that satisfies the above-mentioned range for parameters, which then becomes the canonical actual trajectory. Given canonical trajectory up to phase i − 1, its canonical extension for phase i can be computed using standard gradient descent method for each phase provided there is no error in data. However, if data has errors, polynomials Q M and Q R may not have any common root that corresponds to a feasible trajectory. In such a situation, which is almost always true in practice, we use the method used in proof of Lemma 7. Observe that when data has errors, even the relationships of equation (19) will not hold. Define polynomials: And combine them as: Instead of finding zeroes of all the polynomials, we find minima of P . Since i is likely to be close to i−1 and the same would be true for 1 − c i and 1 − c i−1 , we choose (x, y) = (1, 1) as the starting point. This results in following algorithm: 1. Let P x = ∂P ∂x and P y = ∂P ∂y ; 2. Find a common root of polynomials P x and P y using Newton's gradient descent method, starting from point (1, 1); Note that the common root in the above algorithm can be found quickly even though the degree of polynomials P x and P y is exponential, since evaluation of the polynomials and their partial derivatives at any point can be done efficiently. The above algorithm computes canonical trajectory for all phase except the first one. The trajectory of the first phase cannot be computed since M(0) and R(0) values are not known and therefore, neither are 1 and c 1 . We can estimate c 1 by observing that at time t = 0, the pandemic is in its initial stages and so R(0) ≈ 0. This, in turn implies that 1 1 R T (0) ≈ 0 and so c 1 ≈ 0. This only leaves 1 as unknown. We can get a good estimate of 1 if a serosurvey of the region under consideration is done at any point of time, say at t s . The survey provides an estimate of R(t s ). Varying 1 , we can compute different canonical trajectories and find the one for which the trajectory-computed value of R(t s ) comes close to the serosurvey result. Alternately, if a region has not imposed lockdown or restrictions for extended periods and has allowed normal life, we may assume that the ρ is close to 1 after 18-20 months. By varying 1 , we can compute different canonical trajectories and find the one for which value of ρ is close to 1 after 18-20 months. We call this step calibrating the model. More than one serosurveys allow fine-tuning of 1 , and thus better calibration. The above analysis leads to the following corollary of Theorem 1: Corollary 1. For a detected trajectory D, its canonical actual trajectory is given by A = ( 1 , D). Error estimates for all the parameters can be derived using the estimates forβ andρ in the following way: use upper and lower bounds of parametersβ andρ (for their 95% CI values) in the polynomial P (x, y) and compute the values and c that minimize P . These provide upper and lower bounds for and c respectively. Using these, compute upper and lower bounds for other two parameters β and ρ. Thus we get 95% CI range of values for all the parameters. Both detected and actual trajectories of SUTRA model are a concatenation of multiple SIR trajectories, one for each phase. Theorem 2. For stable period of phase i: • The detected trajectory is the trajectory of an SIR model with contact parameterβ i and populationρ i P 0 . • The actual trajectory is the trajectory of an SIR model with contact parameter β i (1 − i ) and population ρ i P 0 . Proof. We havė follow SIR trajectory with contact rateβ i , removal rate γ, and populationρ i P 0 . For the actual trajectory, we havė Therefore, S, M = M ρ i P 0 , and R = R ρ i P 0 follow SIR trajectory with contact rate β i (1 − i ), removal rate γ, and population ρ i P 0 . During the drift period of a phase, SUTRA trajectory transitions from one SIR trajectory to another. The above theorem explains why Covid-19 trajectories for short periods could be captured well by SIR models or their variants. The theorem also allows us to estimate the peak of active infections, both for detected and actual trajectories. If active infections in detected trajectory peak in phase i,Ŝ = γ β i at the peak. Sincê the condition is equivalent to: Active infections in actual trajectory will also peak at roughly the same time: if peak is during stable phase, then M = 1 i T and so the peak of M is on the same day as peak of T ; and if peak occurs during drift period, the peak of M may differ slightly. The peak occurs at Since the SUTRA trajectory is a concatenation of multiple SIR trajectories with different paramter values, the trajectory may have multiple peaks. A phase will have a peak whenever cumulative reported infections crossρP 0 (1 − γ β ) in the phase withρ andβ being parameter values for the phase. It is well-established that some infected people lose immunity over time [39] . This causes a transition from R back to S. On the other hand, vaccination of a person in S can directly move him or her to R without going through U or T groups. As the time progresses, number of both types of cases rise and start impacting the dynamics of the pandemic. In this section, we show that the revised dynamics are easily captured in the SUTRA model. Immunity loss and vaccination cause movement between groups S and R. Let L U (t) and L T (t) be the fraction of people that move at time t from S to R U and to R T respectively. Note that L U (t) would be negative if people move from R U to S and similarly for R T . Therefore, as argued earlier, we can still assume M = 1 T after the drift period of every phase. Define R 0,U and R 0,T as:Ṙ 0,U = γU, R 0,U (0) = R U (0) R 0,T = γT, R 0,T (0) = R T (0) Let ρP 0 be the effective population at time t, and R 0 (t) = ρP 0 (R 0,U (t) + R 0,T (t)) as usual. Let L(t) be the cumulative number of cases that transition from S to R via immunity loss or vaccination until time t. Then: Lemma 8. R(t) = R 0 (t) + L(t). Proof. Follows immediately from the observation thatṘ −Ṙ 0 = L U + L T . Let N (t) be the fraction of population infected at time t as usual. The following lemma is crucial: Proof. We have: The above lemma shows that: Corollary 2. The trajectory N (t) of the modified model equals the SUTRA trajectory with both β and ρ multiplied by 1 − L ρP 0 . Therefore, ones does not need to modify the model in order to capture the effect of immunity loss and vaccination on the trajectory. The computed detected trajectory from the reported data N T (t) provides the values incorporating their impact and thus the model still captures the detected trajectory well. Computing actual trajectory from these parameter values provides values β(1− L ρP 0 ), ρ(1− L ρP 0 ), amd c for every phase. While this means that we do not get the real values of β and ρ unless L is estimated, we can use the computed values to predict the near future well. A byproduct of adjustment in parameterρ is that it may decrease over time: this will happen when the change in L is positive during the drift period of a phase. This is best viewed as reduction in effective population due to vaccination. The set of pandemic parameters were estimated for each phase within each country. These estimated parameters were used to perform forward prediction of the trajectories. These are reported in the main body of the paper for six countries. The actual parameter values for these six countries are reported in tabular form here. As can be seen from those tables, the duration of each phase can vary from twenty to thirty days in the early stages of the pandemic when there are frequent and significant changes in interventions, to forty-five to sixty days in later stages. The value of 1/ for the first phase of each country, which is fixed through calibration, is highlighted in red color. COVID-19 Coronavirus Pandemic Revisiting the 1957 and 1968 influenza pandemics A contribution to the mathematical theory of epidemics Qualitative analyses of communicable disease models A model for the emergence of drug resistance in the presence of asymptomatic infections COVID-19 India. 2021 COVID-19 India Serotracker Data. 2022 Vaccination Statistics Manindra Agrawal. 2021 Tweet on 29th April Manindra Agrawal. 2022 Tweet on 13th February Manindra Agrawal. 2021a Tweet on 10th July Manindra Agrawal. 2021b Tweet on 25th July SARS-CoV-2 seroprevalence among the general population and healthcare workers in India findings from the second nationwide household serosurvey Seroprevalence of IgG antibodies against SARS-CoV-2 among the general population and healthcare workers in India Prevalence of antibody positivity to SARS-CoV-2 following the first peak of infection in England: Serial cross-sectional studies of 365,000 adults. LANCET Regional Health 2021 Seroprevalence of SARS-CoV-2 Antibodies in the US Adult Asymptomatic Population as of 2022 Seroprevalence of SARS-CoV-2 after the second wave in South Africa in HIV-infected and uninfected persons: a cross-sectional household survey 2022 Seroprevalence of Severe Acute Respiratory Syndrome Coronavirus 2-Specific Antibodies in Australia After the First Epidemic Wave in 2020: A National Survey. Open Forum Infectious Diseases Contrasting SARS-CoV-2 Epidemics in Singapore: Cohort Studies in Migrant Workers and the General Population SARS-CoV-2 Omicron-B.1.1.529 leads to widespread escape from neutralizing antibody responses The significant immune escape of pseudotyped SARS-CoV-2 variant Omicron Omicron variant (B.1.1.529) of SARS-CoV-2: Mutation, infectivity, transmission, and vaccine resistance SARS-CoV-2 Omicron has extensive but incomplete escape of Pfizer BNT162b2 elicited neutralization and requires ACE2 for infection Neutralization and Stability of SARS-CoV-2 Omicron Variant. bioRxiv Omicron variant showed lower neutralizing sensitivity than other SARS-CoV-2 variants to immune sera elicited by vaccines after boost 2022 mRNA-based COVID-19 vaccine boosters induce neutralizing immunity against SARS-CoV-2 Omicron variant Considerable escape of SARS-CoV-2 Omicron to antibody neutralization 2022 Omicron, the great escape artist Activity of convalescent and vaccine serum against SARS-CoV-2 Omicron Neutralization of the SARS-CoV-2 Omicron BA.1 and BA.2 Variants Inferred duration of infectious period of SARS-CoV-2: rapid scoping review and analysis of available evidence for asymptomatic and symptomatic COVID-19 cases Estimated COVID-19 Burden Descriptive epidemiology of SARS-CoV-2 infection in Karnataka state, South India: Transmission dynamics of symptomatic vs Persistent SARS-CoV-2 RNA Shedding Without Evidence of Infectiousness: A Cohort Study of Individuals With COVID-19 2020 A time-varying SIRD model for the Covid-19 contagion in Italy Manindra Agrawal. 2021 Tweet on 14th April Assessment of protection against reinfection with SARS-CoV-2 among 4 million PCR-tested individuals in Denmark in 2020: a population-level observational study