key: cord-0882782-cxcd7hnt
authors: Fabricius, G.; Borzi, R.; Caminos, J.; Grigera, T. S.
title: Immunity acquired by a minority active fraction of the population could explain COVID-19 spread in Greater Buenos Aires (June-November 2020)
date: 2021-12-21
journal: nan
DOI: 10.1101/2021.12.21.21267955
sha: 2f0b98fb84f00f9332646bb33fbe3dec2ff4e395
doc_id: 882782
cord_uid: cxcd7hnt

The COVID-19 pandemic had an uneven development in different countries. In Argentina, the pandemic began in march 2020 and, during the first 3 months, the vast majority of cases were concentrated in a densely populated region that includes the city of Buenos Aires (country capital) and the Greater Buenos Aires area that surrounds it. This work focuses on the spread of COVID-19 between June and November 2020 in Greater Buenos Aires. Within this period of time there was no vaccine, basically only the early wild strain of SARS-CoV-2 was present, and the official restriction and distancing measures in this region remained more or less constant. Under these particular conditions, the incidences show a sharp rise from June 2020 and begin to decrease towards the end of August until the end of November 2020. In this work we study, through mathematical modelling and available epidemiological information, the spread of COVID-19 in this region and period of time. We show that a coherent explanation of the evolution of incidences can be obtained assuming that only a minority fraction of the population got involved in the spread process, so that the incidences decreased as this group of people was becoming immune. The observed evolution of the incidences could then be a consequence at the population level of lasting immunity conferred by SARS-CoV-2.

Since the COVID-19 pandemic began in late 2019 in Wuhan (China), the disease has become a serious public health problem globally. Understanding the transmission dynamics of COVID-19 is a matter as complex as it is important, since any advances facilitate the taking of adequate control measures. 1 Although the transmission of any infectious disease is complex, in the case of COVID-19, several specific features of the disease make it difficult to characterise and control the contagion process. The possibility of contagion before the appearance of symptoms, the existence of asymptomatic individuals, the diversity in the symptoms and the heterogeneity in the immune response observed in different individuals, are just some of the characteristic elements that have become known as the pandemic moved along.

One of the most controversial aspects of the disease is the issue of the immunity generated by SARS-CoV-2 in individuals. The mechanisms of natural immunity and the degree of protection against new infection they eventually confer are subject of intense research. 2 At this point, it is quite clear that natural immunity protects against disease, but the degree of protection obtained against contagion is not so evident. 2-5 The epidemiological impact of individual immunity is even less clear. This is difficult to assess because every epidemiological observable (such as reported cases or serological studies, for example) is affected by a multiplicity of factors, such as effectiveness of health system surveillance or uncertainties in antibody test results. In some cases, the study of the evolution of the pandemic in certain regions at definite times, offers the possibility of exploring the relationship of some of these factors with the temporal evolution of the reported cases. This is what we set to do here, taking advantage of the particular conditions that occurred in a region of Argentina during a six-month period within the pre-vaccination era.

The evolution of the pandemic has produced rising and falling incidence curves in every region, but the reasons behind this behaviour are often very different. For example, the first outbreak observed in Italy between March and June 2020 can be explained by assuming that at the beginning of March the disease spread freely with a high R 0 . Severe restrictions were imposed, causing the value of R 0 to drop violently below 1, and a consequent decrease in daily cases to values insignificant compared to those at the peak. 6 The evolution of cases in Greater Buenos Aires (GBA), Argentina, between June and the end of November 2020 (Fig. 1) is qualitatively similar, but requires a very different explanation. In Argentina, severe isolation measures were established a few days after the appearance of the first cases, when there was practically no community transmission; these measures were then gradually relaxed over time, and never hardened again. At the beginning of June there was a sustained community transmission in all GBA districts, with an average incidence of between 4 and 5 cases per day per 100,000 inhabitants, but no new measures were taken. Thus, the sustained fall of incidence observed since the end of August in Argentina must have a different origin than in the Italian case. Although it is possible that some health measures, such as the use of the face mask or the protocols in hospitals, had been improving over time, it is difficult to attribute the drop to these reasons. Indeed, public data 7,8 (Fig. S1) show increasing mobility in all spaces, even as the incidence continued to decrease. A plausible hypothesis is that this sustained decrease was due to the acquisition of immunity. In this work we explore, through mathematical modelling, under what conditions the fall in incidences that occurs between the end of August and November (which is a phenomenon observed in the 24 GBA districts, Fig.S2 ) can be explained from the acquisition of immunity by vast sectors of the population. The results of our simulations show that there are several scenarios compatible with the available epidemiological information on the spread of COVID-19 in GBA which account for the observed dynamic evolution of the incidences, as long as it is assumed that the transmission process was basically caused by a minority but active fraction of the population that was acquiring immunity.

FIG. 1. COVID-19 incidences in the Greater Buenos Aires (GBA) from March 2020 to February 2021. Data are consigned in cases per day per 100, 000 inhabitants, (red line, left axis). We also show the fraction of the population reported as confirmed case (gray line, right y-axis). Both curves are drawn as a function of the symptoms onset date (SOD), estimated from reported data. 9 Details of the incidences computation are given in the Supplementary Material (SM), section I.B.

Located in Buenos Aires province, the Greater Buenos Aires (GBA) is made up of the 24 districts closest to Buenos Aires city (CABA) with a total population of 9,916,715 inhabitants according to the last census carried out in 2010. Except for La Matanza, which has a population of 1,775,816 inhabitants, all the other districts have less than 650,000 pop. There are 4 districts (San Fernando, Ezeiza, Ituzaingó and Hurlingham) with less than 200,000 pop., 10 districts (José C. Paz, Vicente López, San Miguel, San Isidro, Esteban Echeverría, Morón, Malvinas Argentinas, Berazategui, Tres de Febrero and Avellaneda) that have between 250,000 and 350,000 pop., 3 districts (Tigre, General San Martín and Florencio Varela) between 350,000 and 450,000 pop., 3 districts (Moreno, Lanús and Merlo) between 450,000 and 550,000 pop. and 3 districts (Almirante Brown, Quilmes and Lomas de Zamora) between 550,000 and 650,000 pop. The projected GBA population for July 2020 is 11,264,104, 10 therefore, in order to calculate the incidences, we will correct all the populations by a multiplicative factor of: 1.136 = 11,264,104 / 9,916,715.

The first confirmed case of COVID-19 in Argentina was on March 3rd, 2020. On March 16th, schools and non-essential activities were called off in large cities. On March 20th, a lock-down (Preventive and mandatory social isolation or ASPO in Spanish) was decreed throughout the country, and proceeded from that moment on, with changes to different phases in different regions. On June 4th, 18 provinces discontinued their lock-down, but the GBA was among the regions that did not. Strictly speaking, the GBA was in lock-down from March 20th to November 9th, when it was replaced with social distancing measures (preventive and mandatory social distancing or DiSPO in Spanish). That is, throughout the period to be studied here a lock-down was officially in place. 11 However, communication and actual enforcement of the measures was variable through the period, and so was compliance. Mobility data from Google 7 ( Figure S1 (a)) indicates that mobility in supermarkets, jobs and transport stations began to increase sharply just after the ASPO was decreed until the beginning of June. From June, these indices remained constant or increased at a much slower rate. Fig. 1 shows the incidence in GBA vs. date of onset of symptoms (the same data for individual districts is in Fig. S2 ). From the beginning of June, the incidence in GBA shows a sustained growth (at approximately constant rate) for 2 months, from between 4 and 5 daily cases per 100,000 pop. to 40 daily cases / 100,000 pop. Then for the next 20 days it oscillates around 35 daily cases / 100,000 pop. and finally begins to fall steadily until the end of November. The rate of decrease is also approximately constant but lower than the initial increase rate.

Although the evolution of incidences shows some peculiarity in each individual district, the mentioned qualitative features are the same across the whole GBA region. In most districts the rise and fall times coincide with the global GBA curve, and in some districts (such as Almirante Brown, Berazategui or Morón) even the absolute values are very similar to the GBA case throughout the period.

The incidences estimated from officially reported cases are a lower bound for the actual incidences of COVID-19.

The causes of under-reporting are varied and most likely originate in cases that were asymptomatic or with weak symptoms, although there may be other reasons, such as those concerning the efficiency of the detection system. It is very difficult to estimate the number k of actual cases per reported case. The literature estimates vary widely, according to the criterion adopted and the stage of the pandemic. [12] [13] [14] [15] [16] [17] We discuss under-reporting for GBA in the Supplementary Material (SM: section I.D) and conclude that a plausible range for the average value of k in the studied period is between 3 and 6. However, it is important to keep in mind (as we discuss later) that under-reporting could vary somewhat over time, and also be different in different districts.

Mathematical modelling

In a subject with the complexity of the spread of COVID-19, it is impossible to consider a mathematical model that takes into account all the elements and the heterogeneity present in the problem. The spirit of our modelling is to take into account the central ingredients, assume certain hypotheses (based on a set of evidences) and then use the model as an exploration tool. In particular, we are interested in exploring under what conditions the model reproduces the increase in incidences by a factor of approximately 8 after approximately 70 days and the subsequent decrease by a similar factor in a slightly longer time, both observed in the dynamic evolution of the reported incidences ( Fig.1 and Fig.S2 ). Given the uncertainties in the knowledge of the transmission process and the ignorance of the cases of COVID-19 that really existed in the analysed period (of which the reported cases are only a fraction), we do not seek to obtain the set of model parameters that best fits the reported cases, but to evaluate the degree of plausibility of different scenarios.

To build our model we assume the following:

1. All infected individuals go through a latency period after which they become contagious. We not distinguish between the contagion capacity of symptomatic and asymptomatic individuals, nor will we take into account whether some were isolated and others not: all infected individuals infect equally but the time each individual is contagious can vary and is described by a probability distribution.

2. When infected individuals recover, they acquire immunity lasting at least 6 months (the period we attempt to model 18 ).

3. The structure of social contacts through which contagions occur stayed the same in GBA between June and November 2020. This is probably not strictly true, but nevertheless a reasonable working hypothesis.

Only a fraction f C of the individuals participated in the contagion process. This hypothesis is based on the situation that occurred in Argentina, where some individuals were very little exposed to contagion (because they performed virtual tasks, and when they went out they took care of themselves using face mask and respecting the decreed social distancing) while others suffered a much higher degree of exposure (because they did not comply with the standards of care in force, or due to their work situation). There were probably several intermediate behaviours but we will assume as a working hypothesis that the population is divided into two types of individuals: those who participate in the contagion process and those who do not.

5. In the period studied (June-November 2020), the strongest contacts occurred at home, the next strongest among groups of close families, then among the neighbourhood and so on, decreasing as the spatial scale increases. This is justified given that there were still serious restrictions on allowed activities.

6. The epidemic "ran by itself" within each district of GBA from June on, when community transmission was already occurring in most neighbourhoods. That is, we neglect effects due to cases imported from other districts or from abroad. This would not be justified in the previous months, in which the influence of imported cases from abroad or from the city of Buenos Aires may have played an important role. The simulations we have carried out to validate this hypothesis are shown in the SM (section III.B.2).

We consider a stochastic epidemiological model where a susceptible individual, once infected, goes through the chain of states S → E 1 → E 2 → I 1 → I 2 → R, corresponding respectively to susceptible, two stages of exposed, and two stages of infected individuals. (see Fig. 2 ).

The population is made up of N individuals where the individual j is in the epidemiological state X j . The state of the system at a given instant is defined by X = (X 1 , X 2 , ...X N ). X(t) is a stochastic variable that evolves in time according to the Markov process defined by the transitions indicated in Fig. 2 .

The probability of contagion per unit of time of individual j, W j inf is determined by the assumed social structure. We use a hierarchical contact structure like the one schematised in Fig. 3 . This allows considering a certain . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.21.21267955 doi: medRxiv preprint FIG. 2. Epidemiological model used in this work. A given individual is described by its epidemiological state X. A susceptible individual (X = S) may became infected in contact with an infectious individual (I1 or I2, which represent two stages at which the individual can infect) with a probability rate W inf . E1 and E2 are latency states (the exposed individual is infected but cannot yet infect). The transitions from E1 to the successive states occur randomly with constant probability rates σ1,2 and γ1,2. The final state (X = R) represents a recovered, immune individual. We assume that there is no conversion from R to S during the period of our study (possible effects of this not being true are discussed in the SM (section III.B.1). Two epidemiological states have been chosen to describe the latency phase and another two for the infectious phase so that the model reproduces distributions of latency and infection times compatible with those reported in literature and available epidemiological data, see SM (section II.B). Using more than one state for the same epidemiological phase is known to produce more realistic transition dynamics, 19 in particular two classes have also been used to model COVID-19 by other authors. 20 W inf is not constant, but depends on the instantaneous configuration of infected individuals (see text). spatial heterogeneity in the spread of the disease compatible with the mobility restrictions in force in the period studied. Each individual is associated with a group of a given level. Within each level l (household, building, neighbourhood and so on) an individual interacts homogeneously with the other individuals in the same group with the same effective rate of contacts per unit of time. We indicate with l = 1, 2, . . . , L the level in the group hierarchy. An individual j belongs to group ν l,j at level l, which has N ν l,j members. The probability of transition per unit of time for the infectious process is given by

where N ν l,j (I 1 ) and N ν l,j (I 2 ) are the numbers of individuals of group ν l,j in states I 1 and I 2 respectively. The parameters β l are associated with the probability of contacts per unit of time between individuals within the level l. In addition, according to hypothesis 4 above, some families (groups with l = 1, level "household") are randomly marked "inactive" (i.e. they have no social contact) and do not participate in the contagion dynamics even if they are in state S. These families can eventually become active at a later stage of the dynamics. We will take N ν l,j = N l , depending only on the level (i.e. all families, neighbourhoods etc. are all the same size). Note however that since the choice of which families will be inactive is random, the number of active individuals will be different for different groups at the same level for l > 1, and therefore the effective contact Fig. 2 ) are hierarchically grouped in families which belong to buildings, neighbourhoods, etc.. Black circles are either recovered (red edge) or inactive. In either case they are inert as regards their capacity to infect or become infected. Recovered individuals remain in this state for the whole simulated period, while some of the inactive may be activated (turning them to susceptible) in order to simulate a behavioural change in those individuals. The fraction, fC , of inactive individuals is a parameter of the model and fixed at the beginning. Inactive individuals always belong to isolated families (marked by black borders and fully occupied by inactive individuals), chosen randomly within the different neighbourhoods and towns. Some spatial structure is taken into account by the hierarchical nature of the model. The contact rate for individuals of the same household (β1, marked in blue) is assumed stronger than for individuals from different households (β2, brown). Individuals from different neighbourhoods interact with an even smaller rate (β3, yellow), while people from different towns (the next hierarchical level) interact with β4 (light blue).

rates will fluctuate around a mean value β eff l ≈ f C β l for N l 1, where f C is the fraction of active individuals, see SM (section II.A). To simplify matters, we will generally refer to a given level using the number of individuals in a given group of this level rather than the index l. For instance, a level with groups of N l = 20000 individuals will be indicated as a "n2E4 neighbourhood". A group of the first level is usually mentioned as a family or household.

Stochastic simulations are performed with the Gillespie algorithm. 21 Simulation software is available for download at the GitHub repository (see Data Availability section). The incidence Inc(t) (in cases per day) is computed as the number of individuals whose state changes from exposed to infected (E 2 → I 1 ) during day t, and the reproductive ratio, R(t), at day t as

where t inf is the mean duration of infection, N [S → E 1 ] is the number of individuals whose state changes from . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.21.21267955 doi: medRxiv preprint susceptible to exposed between day t − 1 and day t , and N I (t − 1) is the total number of infected individuals (in states I 1 or I 2 ) at day t − 1.

For the results presented in this section, we take σ 1 = σ 2 = 0.66667 1/day which corresponds to an average latency time of 3 days, γ 1 = γ 2 = 2γ = 0.28571 which corresponds to a mean time of infection t inf =1/γ=7 days, and β 1 =0.214 1/day for the rate of household contacts. In the SM (section II.B) we provide a justification for the choice of these values based on several studies and GBA-epidemiological data. 6% of the population, respectively. Either this percentage or the time at which the peak occurs misses those observed in reported data by much more than can be reasonably expected, pointing to a failure within this simple approach. The uniform mixing approximation is achieved in our model by taking: β1 = β2 = ...β k−1 = 0, β k = R0 * γ, fC = 1. These curves were obtained by solving the deterministic equations (which correspond to the limit N →∞) starting from an initial condition where 99.99% of the population is susceptible and 0.01% is in the infectious state I1.

To motivate the need for the spatial model we have introduced, we begin by studying the evolution of the incidences using the much simpler uniform mixing approach. Figure 4 shows the incidence as a function of time under uniform mixing for two values of R 0 . What we observe is that for R 0 =1.45, the incidence grows by a factor of 8 until reaching the maximum in a time of 70 days (as observed in the incidences obtained from reported cases of the GBA, Figure 1) . However, the absolute values obtained are almost 20 times greater than those reported, resulting in 55.0% of the population infected at the end of the epidemic (a much higher proportion than the 11.6%

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.21.21267955 doi: medRxiv preprint TABLE I. Parameters that define the 5 scenarios considered. Number of levels (L), number of individuals in level 1 (N1), number of groups of size N l−1 individuals that make up level l (n l , l > 1), household contact rate (β1), average effective contact rate between individuals of the same group in level l (fC β l ), and mean fraction of active individuals (fC ) defining the model. Note that the number of individuals on a group in level l can be calculated as N l = N1n2 . . . n l , l > 1. Iini is the initial number of infected introduced at time −tini. The constant k is not a model parameter: it is the factor we use to multiply the reported incidences so that they approach the incidence curve obtained for each scenario (see Figs. 5-7) . In a similar way, tini (in days) is determined after the simulations, so that it maximises the agreement with the epidemiological data. We have explored many more scenarios, part of which are described in section III.A of the SM and Ref. 29 obtained from serological studies 30 ). On the other hand, for R 0 =1.17 the incidence obtained from the model takes values around 3 times those reported (which are within the plausible range), but the development time of the epidemic turns out to be greater than one year (more than twice the observed time). It is clear that the homogeneous mixing approach gives a description of the development of the epidemic that does not even approximate the orders of magnitude involved, pointing that a drastic change in this approach is necessary. One possibility (and one of the key points in this work) is to assume that only a small fraction of the population participates in the contagion process; this could explain how a maximum in the incidence curve could be attained in a relatively short time without a huge fraction of the population getting the disease. Instead, the peak would occur when a relevant fraction of the most active (regarding their chances of getting and spreading the illness) population is infected. For example, assuming that 25% of the population is active and contacting each other homogeneously while the rest remain inactive, for a value of R 0 =1.45, the number of cases would be reduced to 25% and the same would happen with the incidences (since the total population does not change). This produces incidences only about of the order of 5 times larger than those in Fig.1 , compatible with expectations, see SM (section I.D).

To explore this idea in more realistic situations, we considered various plausible structures corresponding to the hierarchical model in Figure 3 that reproduce the central features of the dynamic evolution of the incidences observed in Figure 1 . We consider a system of 300,000 inhabitants (size of a typical GBA district), starting from a state in which a fraction f C of individuals participates in the dynamics of the epidemic (active) and the rest is inactive. In the initial condition t = t ini , all active individuals are susceptible except I ini individuals that are in state I 1 . We are not concerned here with how those initially infected entered GBA; we briefly discuss this in the SM (section III.B.2). The system is allowed to evolve freely and, in order to compare with data, reported incidences are multiplied by a factor k and the origin of the time axes adjusted so as to obtain the best agreement in each case.

The parameters of the five scenarios discussed in the present section are presented in Table I (more scenarios are considered in section III.A of the SM). Top panels for Fig. 5a show the incidences obtained from scenarios 1 and 2. In scenario 1, the individuals interact within the family and with any other individual in the district while in scenario 2 there is a higher hierarchy of interactions that decrease in magnitude with the spatial scale (β 1 > β 2 > β 3 β 4 β 5 ). In both cases, incidences values are obtained that are compatible with those reported with k = 4.5, which is within what it is expected. In scenario 1, where there are only two hierarchical levels (i.e. the scenario where the active individuals interact almost homogeneously), the incidence shows a faster fall than in scenario 2, in which the interactions between individuals from different neighbourhoods are very low.

The slower fall in incidences in scenario 2 can be understood by observing the evolution of those infected in the different neighbourhoods that make up the district (Fig.  5 (b) and c) ). Although in both cases there are stochastic fluctuations at the different levels, in this case such fluctuations are stronger, and there is a significant lag in the beginning of the epidemic within each neighbourhood as well. In addition to this lag, there is also a marked heterogeneity in the way in which the epidemic develops in the different neighbourhoods. These differences are the origin of the slower decrease of incidence found in scenario 2 (Fig. 5 (a) ). These two scenarios have been chosen just to exemplify cases of high or low coupling between neighbourhoods. We consider other cases in the SM (section III.A).

The differences in incidences that have been observed between different GBA districts ( Figure S2) can be reproduced by simulating different systems with slightly different values of the parameters β l , N l o f C (which would imply attributing them to different individual behaviour in different districts). However, they can also be obtained from different simulations of a system with the same set of parameters (i.e., they are a consequence of the stochastic nature of the model, which is more evident for certain sets of parameters, as we will see in the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.21.21267955 doi: medRxiv preprint following). Fig. 6 shows 4 realisations of a scenario 3 system, with parameters similar to scenario 2, but where initially there are I ini = 60 infected (instead of 150). The lower number of initially infected distributed in the system in this case increases its initial heterogeneity, which translates into differences at district level that were not observed in the different realisations of scenario 2 (not shown).

FIG. 6. Less initially infected agents: more fluctuations. Simulated (blue) and reported (red) daily cases per 100000 inhabitants as a function of time for a total population of 300000, for the hierarchical model of scenario 3, similar to scenario 2 (shown in Fig. 5a ) right panel) but for Iini = 60. Panels a) to d) show different realisations of the same model with the same number of initially infected individuals that are randomly assigned in a different way into the system. We observe marked fluctuations in the behaviour, based on the stochasticity of the model and supported by its intrinsic nonhomogeneous structure. Similar fluctuations have been observed on data corresponding to different districts within the GBA (Fig. S2) . One of our points is to show that these fluctuations can be based on real differences on the district connective structure, or simply on the stochastic nature of the processes under way. In this case the data were multiplied by a constant factor k=5 (also compatible with a plausible under-reporting) in order to approach the simulation results.

In Fig. 7 we show the results of scenarios 4 and 5 (see Table I ) which have structures similar to those of scenario 2, but also the fraction of active individuals is higher. It is seen that it is possible to reproduce the observed dynamics, but with k factors that are too large and incompatible with the available epidemiological information. Therefore we conclude that to obtain results compatible with the data, f C must be lower than 0.50. This picture, in which there is a relatively small fraction of active individuals, implies a potential risk: that the majority fraction who remained inactive would join the arena -either because they stop taking care of themselves or because they get involved in activities in which they did not participate. This may have been at the bottom of what happened in Argentina at the beginning of December when there was a significant increase in cases   FIG. 7. Other fractions of active individuals. Panels a) and b) correspond to scenarios 4 and 5 , with fC = 0.50 and fC = 1.0, respectively. It is seen that the dynamics of the curve can also be reproduced as in scenario 2 (Fig. 5 , right panel) but the required k-factors (10 for case 4 and 22 for case 5) are not compatible with the epidemiological information available.

that had its maximum near the new-year celebrations. It is possible that this rise was the result of two effects: more people becoming active, and an increase of infectious contacts. In order to illustrate the consequences of the first of these effects, two simulations (f25 and f50 in Fig. 8 ) were performed where additional fractions of 25 and 50% of individuals were set active by the end of November, preserving the same effective contact rates as the previous active individuals (same values of β 1 and f C β l , for l > 1). The results are shown in Fig.8 . Panel a 2 ) shows that freeing an additional 50% of the population produces a curve compatible with the rate of rise of incidences observed in the reported data. There were many changes in the epidemiological situation beyond the end of November 2020; this accounts for the structure observed in the data shown in Fig. 8 , which is beyond what we are trying to model here. The purpose of this study concerning dates after November 2020 is to show that the huge increase in cases that was observed at the end of 2020, and later in 2021 (Fig. 8) is compatible with a significant number of individuals who remained susceptible and began to interact.

Before drawing conclusions, some considerations regarding the hypotheses and the results must be made, given the uncertainties concerning the epidemiological data, as well as those that still exist about the characteristics of the disease.

In order to compare the results of the simulations (which provide with the number of all the "real" cases once a model and a given realisation of randomness are assumed) with the epidemiological data (which suffers from under-reporting) we have multiplied the latter by a time-independent factor k. Based on previous works [30] [31] [32] [33] [34] [35] we estimate the value of k between 3 and 6 (see SM: Sec. I.D)). A first remark is that, as we noted at the end of Sec. IIA, the factor k might not be constant over time, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Fig. 5 , now exploring scenario 2 beyond the end of November. Panels a) show the simulated (blue) and reported (red) daily cases per 100000 inhabitants as a function of time for a total population of 300000, for Iini=150 initially infected. As in Fig. 5 , right panel, the simulations were performed with 75% of the population remaining inactive regarding the infection or transmission of the virus. However, at day 180 (in correspondence with end of November) this fraction was reduced by freeing (changing its state from inactive to active) a fraction of the population of 25% (f25, left panel) and 50% (f50, right panel). There is a clear correspondence with a minor "second wave", shown as a peak in the data. Our simulations clearly signal the potential risks of this highly metastable situations (particularly in the second huge peak seen in panel a2), blue curve). Panels b) explore the different n2E4 neighbourhoods, and panels c) and d) what happened for different n2E3 within two n2E4 neighbourhoods.

which can affect the shape of the curve of real cases in a way that is difficult to predict. At this point it is then very important to differentiate what may be considered an artificial feature, and what a solid fact. For example, it is not so straightforward to conclude from the comparison of the red and blue curves in panels a 1 ) and a 2 ) of Fig. 5 , that Scenario 2 (highly heterogeneous) describes the spread of COVID-19 in GBA better than Scenario 1. On the other hand, given that the definition of a suspect case became more inclusive over time, 36 and that there is no reason to think that the vigilance has been greatly relaxed, the massive drop in cases observed in Fig. 1 from September onward remains a basic feature that any reasonable proposal should try to explain.

A second remark is that the slow decline of reported cases (panels a) in Fig. 5 could be the consequence of an increase in the values of the contagion rates (which we have assumed constant) since September, rather than due to the stochastic heterogeneity among different neighbourhoods proposed within scenario 2. Although a hypothetical increase in rates could be attributed to the slight observed increase in mobility or a relaxation of preventive measures, this could have been compensated by the fact that more activities were carried out in open spaces (due to increasing temperatures) which would lower those rates.

However, there is independent epidemiological evidence that points in the direction of a heterogeneous scenario in the spread of COVID-19 in the GBA. In a seroprevalence study carried out in neighbourhoods of different districts of the GBA, 34 a high heterogeneity was found without a clear explanation, sometimes finding very different values for seroprevalence in nearby neighbourhoods with similar characteristics. According to our simulations, stochasticity in weakly coupled neighbourhoods could account for this phenomenon. For example, looking at two of the n2E4 neighbourhoods of panel b 2 ) of Fig. 5 , we see that the fraction of individuals recovered up to t = 150 (October 18) in the neighbourhood indicated with pink circles is 18%, while it is only 11% for that with green circles. Moreover, for that same date, the fraction of individuals recovered in each of the n2E3 neighbourhoods of panel d 2 ) of Fig. 5 fluctuates between 4 and 19%. Since only 25% of individuals on average can get sick (f C =0.25), this spread of values is markedly heterogeneous.

An important point is that the description that emerges from scenario 2 is not the consequence of a particular choice of model parameters. We check in the SM that the results are robust under changes within plausible values of these parameters (section III.A), under the consideration of the distribution of household sizes corresponding to GBA (section III.B.3), and also under the possibility of an increasing income of infections originated in CABA, where many GBA residents work (section III.B.2).

A third remark concerns natural immunity. Even though the mechanisms that generate natural immunity are not fully understood and are subject of research, several studies estimated that infection with SARS-CoV-2 provided 80-90% protection from reinfection for up to 7 months. 2,3,5 Natural immunity would confer protection against symptomatic and also asymptomatic reinfections. Reinfection is rare but has also been reported. In order to simplify the approach, we haven't considered this possibility here, but we explore the issue in the SM (section III). Using a deterministic model and assuming that 20% of the population does not acquire immunity after infection, we find that the impact of the effect is low: the rate of increase in incidence is basically unchanged, and the peak is reached for a value 18% higher. We therefore expect that the inclusion of this effect in the hierarchical model will not produce qualitative changes beyond a moderate increase in the value of k necessary to approximate the data.

It is important to acknowledge that there exist previous works discussing the possible reduction of the diseaseinduced herd immunity level as a consequence of the fact that social contacts in the population can be very inhomogeneous. 37, 38 The idea is that the better connected individuals tend to become infected (and then, recovered and immune) before the less connected ones. The latter, with a much lower potential to infect or become infected, become increasingly predominating within the active population; this naturally lowers the immunity fraction needed to stop the propagation of the disease. In a way, our assumption that only a minority fraction participates on the contagion process takes this idea of heterogeneity to an extreme. In spite of this, it is important to realise (as exemplified in Fig. 8 ) that in our case it is not a real herd immunity, since inactive individuals constitute a pool of susceptibles with the potential to become active.

The fourth remark regards pre-existing immunity to SARS-CoV-2. Detection of SARS-CoV-2-reactive CD4+ T cells in unexposed individuals, suggests cross-reactive T cell memory between circulating "common cold" coronaviruses and SARS-CoV-2. 39 However, it has been argued that pre-existing SARS-CoV-2-specific T cells are unlikely to provide sterilising or herd immunity, and it is unclear whether it even affects the severity of the disease. 40, 41 So, it does not appear that pre-existing immunity affects the results of this study.

Finally, weather may play a role that is not trivial to analyse. Low temperatures are usually associated with less ventilation and more transmission. This was probably not a great influence, since the curve stops growing and starts to go down during August when it is still relatively cold in this region (mean temperatures vary from ≈ 11 in July to ≈ 13 Celsius in August 42 ). On the other hand, at the beginning of the southern summer the curve was in full rise but, as mentioned, the leading effect at that time was probably from the social new-year gatherings. In any case, it cannot be ruled out that the weather may have had some influence.

The evolution of incidences in Greater Buenos Aires presents certain characteristics shared by the 24 districts that comprise it. The incidence curve goes though a peak with a fast increase during June and July, and a somewhat slower downward slope encompassing September, October, and November. This evolution occurred under very particular conditions. The six-month period between June and the end of November 2020 was before the massive application of vaccines; also -and in con-trast with other peaks observed during the COVID-19 pandemic in other places-the important social and hygienic measures had been taken before the fast incidence increase, and remained fixed during the curve development. Here we have shown that a coherent explanation of this evolution can be obtained if it is assumed that a minority fraction of the population participated in the contacts, so that the incidences decreased as this group of people was becoming immune. This picture is compatible with the strong outbreaks seen later, which point to the existence of a considerable pool of susceptible individuals. Our explanation implies that the evolution of the incidences in GBA would be the epidemiological manifestation of immunity conferred by SARS-CoV-2 infection.

The hypotheses assumed account for the observed behaviour without the need to fine-tune the set of parameters. This is one of the strengths of our model, whichalthough it could be tuned to match closely the available data-we use to explain the curve drop and to analyse a number of alternatives consistent with the several factors that may have influenced the dynamics within the epidemiological uncertainties.

Taking as true the rate of rise of reported incidences in June and July 2020, our model predicts a basic reproduction number R 0 around 1.5 during epidemic spread. On the other hand, the fraction of active population f C can not be accurately determined, since it depends on the actual number of cases that (due to under-reporting) can only be estimated roughly. One of our hypothesis is that individuals interact with markedly decreasing strength as one considers wider circles of contacts. This is compatible with the fact, supported by epidemiological evidence, of great heterogeneity in the transmission process in GBA. This heterogeneity, plus the intrinsically stochastic nature of the transmission process means that, even under the same conditions, results of the transmission at local levels can be very different in different districts. This constitutes a difficulty when trying to evaluate the effects of a public health policy, because it may be implemented in the same way in two places but result in a very different number of cases.

We acknowledge financial support from the Agencia Nacional de Promoción Científica y Tecnológica (AN-PCyT) through PICT 2017-2347, and the Consejo Nacional de Investigaciones Científicas y Técnicas (CON-ICET) through PIP 0446 and PIP 0292. G.F. wishes to aknowledge helpful discussions with colleagues from the "Red de Modelización de Enfermedades Infecciosas" (RITS-CONICET) and from the "Subcomisión de Vacunología" (Argentine Association of Microbiology).

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

The copyright holder for this this version posted December 21, 2021. ; https://doi.org/10.1101/2021.12.21.21267955 doi: medRxiv preprint VII. DATA AVAILABILITY Simulation software in C++ was developed for this work, and is available in the GitHub repository, https: //github.com/tgrigera/COVIDm. Data was provided by the Health Ministry of Argentina through the National Science Council (CONICET). Most of the data is available in the following site:

https://datos.gob.ar/dataset/ salud-covid-19-casos-registrados-republica-argentina/ archivo/salud_fd657d02-a33a-498b-a91b-2ef1a68b8d16, and the rest can be requested at https://www. argentina.gob.ar/solicitar-informacion-publica since in Argentina data is public by law 27.275 "Derecho a la información pública".

Population estimates by sex, department and calendar year 2010 -2025

Health measures for the COVID-19 pandemic in Argentina

Estimated incidence of coronavirus disease 2019 (COVID-19) illness and hospitalization

Substantial underestimation of SARS-CoV-2 infection in the United States

State-level tracking of COVID-19 in the United States

Undiagnosed SARS-CoV-2 Seropositivity During the First Six Months of the COVID-19 Pandemic in the United States

Estimating the undetected infections in the COVID-19 outbreak by harnessing capture-recapture methods

SARS-CoV-2 antibody prevalence in Brazil: results from two successive nationwide serological household surveys

In the SM (section III.B.1) we explore the consequences of assuming that only a fraction of individuals acquire immunity

Modeling infectious diseases in humans and animals

Early analysis of the Australian COVID-19 epidemic

A general method for numerically simulating the stochastic time evolution of coupled chemical reactions

Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia

Clinical characteristics of coronavirus disease 2019 in China

SARS-CoV-2, SARS-CoV, and MERS-CoV viral load dynamics, duration of viral shedding, and infectiousness: a systematic review and meta-analysis

National Center for Immunization and Respiratory Diseases (NCIRD) Division of Viral Diseases

Household transmission of SARS-CoV-2: a systematic review and metaanalysis

The role of children in the spread of COVID-19: Using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children

Transmission of SARS-COV-2 infections in households-Tennessee and Wisconsin

Numerical exploration of the conditions in which the acquisition of immunity can account for the observed drop in incidences in Greater Buenos Aires

Provincial COVID-19 seroprevalence survey

Surveillance and seroprevalence: Evaluation of IgG antibodies for SARS-Cov2 by ELISA in the popular neighbourhood Villa Azul

COVID-19 seroprevalence survey methodology and definitive results

Severe acute respiratory syndrome Coronavirus 2, Seroepidemiology study in Argentinian slum

Seroprevalence of antibodies against SARS-CoV-2 in popular neighbourhoods of Buenos Aires Province

Estimation of detectable but undetected cases

Stochastic model for COVID-19 in slums: interaction between biology and public policies

A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2

Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold

Targets of T cell responses to SARS-CoV-2 coronavirus in humans with COVID-19 disease and unexposed individuals

The known unknowns of T cell immunity to COVID-19

Cross-reactive memory T cells and herd immunity to SARS-CoV-2

Climate of Argentina