key: cord-0561360-w0c6s34l
authors: Muller, Sebastian A.; Balmer, Michael; Charlton, William; Ewert, Ricardo; Neumann, Andreas; Rakow, Christian; Schlenther, Tilmann; Nagel, Kai
title: A realistic agent-based simulation model for COVID-19 based on a traffic simulation and mobile phone data
date: 2020-11-23
journal: nan
DOI: nan
sha: a64efd6b6577dccacff6deac726f8729a8f9c064
doc_id: 561360
cord_uid: w0c6s34l

Epidemiological simulations as a method are used to better understand and predict the spreading of infectious diseases, for example of COVID-19. This paper presents an approach that combines person-centric data-driven human mobility modelling with a mechanistic infection model and a person-centric disease progression model. The model includes the consequences of disease import, of changed activity participation rates over time (coming from mobility data), of masks, of indoors vs. outdoors leisure activities, and of contact tracing. Results show that the model is able to credibly track the infection dynamics in Berlin (Germany). The model can be used to understand the contributions of different activity types to the infection dynamics over time. The model clearly shows the effects of contact reductions, school closures/vacations, or the effect of moving leisure activities from outdoors to indoors in fall. Sensitivity tests show that all ingredients of the model are necessary to track the current infection dynamics. One interesting result from the mobility data is that behavioral changes of the population mostly happened textit{before} the government-initiated so-called contact ban came into effect. Similarly, people started drifting back to their normal activity patterns emph{before} the government officially reduced the contact ban. Our work shows that is is possible to build detailed epidemiological simulations from microscopic mobility models relatively quickly. They can be used to investigate mechanical aspects of the dynamics, such as the transmission from political decisions via human behavior to infections, consequences of different lockdown measures, consequences of wearing masks in certain situations, or contact tracing.

The general dynamics of virus spreading is captured by compartmental models, most famously the socalled SIR model, with S = susceptible, I = infected/infectious, and R = recovered [1, 2] . Every time a susceptible and an infectious person meet, there is a probability that the susceptible person becomes infected. Some time after the infection, the person typically recovers. Variants include, e.g., an exposed (but not yet infectious) compartment between S and I.

Instead of running these models with compartments, one can run them on a graph [3, 4, 5, 6, 7] . Persons are represented as vertices, connections between persons are denoted as edges. The random interactions that are implied by the compartmental models are then replaced by interactions with graph neighbors.

In reality, these interactions change from day to day; in particular, possible superspreading events like weddings or other large gatherings cannot be encoded in a static graph. For this, temporal networks have been investigated ( [7] , section VIII).

Finally, a "different framework emerges if we consider nodes as entities where multiple individuals or particles can be located and eventually wander by moving along the links connecting the nodes" [7] .

When COVID-19 took hold in Europe, one model of this type by Imperial College [8, 9] had a large impact on policy in the UK. Other examples of this approach are by the Virginia Biotechnology Institute [10, 8] and by the Center for Statistics and Quantitative Infectious Diseases in Seattle [11, 8] . Examples for similar approaches on the global level are [12, 3] . Groups that started more recently include [13] and [14] .

Our own approach in this direction, presented in this paper, continues work by Smieszek et al [15, 16] and by Hackl and Dubernet [17] . The important difference, and major innovation, is that our model is entirely data-driven on the mobility side, i.e. both the "normal" person trajectories and the reduction of activity participation over the course of the epidemics stem from data. This allows to considerably speed up the model implementatation, and to reduce the number of free parameters. In the present paper, we show how such a system was built up rather quickly from pre-existing synthetic mobility traces from mobile phone data, which were originally generated for traffic applications. In fact, we built a prototype in about two weeks [18] . Subsequently, we received funding to continue our research and to regularly report to the ministry of research of Germany (e.g. [19, 20] ).

The model is used to replay the epidemics in Berlin. This allows important insights in the transmission from government actions to mobility behavior to infection dynamics. Importantly, it will turn out that at least in Berlin and presumably in Germany, the population started reducing its out-of-home activities before the government asked/ordered the population to do so.

Important sub-models of agent-based epidemics models are: contact model, infection model, and disease progression model. These are described in more detail in the following sections.

As stated, we take the contact model from transport modelling, more precisely from activity-based transport modelling (e.g. [21, 22, 23] ). Such models generate complete daily activity chains of persons, for example something like home-work-shop-home-leisure-home. Activities come with times and, importantly, locations. The activity chains are normally used as input for (agent based) transport simulations, which assign modes and routes and possibly re-adjust times and locations, and thus generate emergent effects such as congestion and emissions [24] . In the present paper, they are instead used as input to an epidemic model. For these activity chains, one could, for example, use trajectories from mobile phone data [25] . These trajectories are often not available, for example for privacy reasons. It is, however, possible to generate synthetic approximations to these trajectories. One approach is to use information from mobile phone data (but not the full trajectories), and process them together with information about the transport system and with statistical information from other surveys [26, 27] . That approach leads to synthetic movement trajectories for the complete population (cf. Fig. 1 ).

From these trajectories, we extract how much time people spend with other people at activities or in (public transport) vehicles. That is, infection opportunities are directly taken from the input data.

Handling of large facilities The resolution of our input data comes at the level of "facilities". Those can be interpreted as buildings or sometimes blocks. They often contain multiple households, multiple company offices, multiple leisure facilities, multiple shops, etc. For home activities, we split persons living in the same facility into realistic household sizes with a maximum number of six people per household [28] . This seems important since the within-household dynamics of COVID-19, and in particular the fact that the secondary attack rate in households seems to be far below 100%, plays an important role (e.g. [29] ). For all other activities, we divide the facilities by some globally set factor, called N spacesPerFacility . That is, if two persons spend overlapping time at the same facility, the probability that they have interacted is 1/N spacesPerFacility . This has important ramifications for multi-day modelling and mixing, see below. Multi-day modelling Optimally, one would have multi-day trajectories. In our case, the data that we have ends at the end of the day. Our simulations thus run the same person trajectories again and again (except for weekends, see below). This presumably underestimates mixing, since it is plausible to assume that there is some variation in activity patterns from day to day. At this point, one needs to make a decision whether our sub-spaces (see above) are frozen, meaning that the same sub-groups meet every day, or not. Using the same sub-groups every day arguably is plausible for office buildings, which may contain offices for several companies, and interaction may be limited to sharing an elevator. It is less plausible for public transport trains, where passengers are arranged differently every day. Possibly, a mix between the two approaches is plausible, introducing the need for even more free parameters. In our present model, we opt for the non-frozen setting, i.e. the other persons within a facility that an ego person interacts with are randomly re-drawn for every new simulated day.

N spacesPerFacility evidently influences the number of contacts that a person has. For our simulations, we set it such that that number of contacts is roughly consistent with real-world contact tracing. For our current input data, that leads to a setting of N spacesPerFacility = 20.

Weekend modelling As already alluded to above, we use separate models for Saturdays and Sundays. They come out of transport modelling in the same way as we obtain the model for a "typical weekday" (see above). These models use the same synthetic persons and facilities, and thus can be aligned with the weekday model. In consequence, each synthetic person in our models, starting on monday, (a) repeats the same weekday five times, (b) runs her Saturday schedule, (c) runs her Sunday schedule, and then starts over.

25% sample For computational reasons, we use a 25% sample of the full population. The sample is constructed by choosing 25% of all persons in the population randomly and retaining their full trajectories. The splitting of households as described above is done after the sampling, meaning that we have realistic household sizes in the 25% scenario but consider only 25% of them; also, the number of contacts to determine the parameter N spacesPerFacility (see above) is determined for the 25% model. We have also run the full 100% model to check that there are no major differences. The 25% model allows to finish runs within a single-digit number of hours, which was and is important for fast model turn-around driven by the the necessity for fast progress given the demand for the results by the decisionmakers. All results are reported after upscaling to 100%.

Once two persons are identified to have contact, and one of them is contagious and the other is susceptible, there is a probability of an infection. For this, we use a mechanical model by Smieszek [15, 30] : Infected persons generate a "viral load" that they exhale, cough or sneeze into the environment, and people close by are exposed. Overall, the probability for person n to become infected by this process in a time step t is described as

where m is a sum over all other persons, sh is the shedding rate (∼ microbial load), ci the contact intensity, in the intake (reduced, e.g., by a mask), τ the duration of interaction between the two individuals, and Θ a calibration parameter. 1

For small values of the exponent, one can approximate Eq. (1) as

We do not use this approximation in our computer implementation, but it helps understanding the following arguments.

All parameters can be given in arbitrary units as long as they are always the same since the units are absorbed by Θ.

Contact intensities For SARS-CoV-2, it is plausible to assume that a large share of the virus material is shed as aerosol [31] . In consequence, the first relevant term to compute the viral concentration in the air is the shedding rate, sh.

For such aerosols, it is plausible to assume that they mix quickly into the room, leading to the same uniform concentration everywhere [32] . Evidently, that concentration is indirectly proportional to room size: if the room is twice as large, the resulting concentration is half as large.

Next, air exchange plays a role. One could, for example, assume that the windows are opened once per hour, and all of the air is replaced with outside air. This would correspond to an air exchange rate of 1/h. If one assumes a constant rate of virus emission, there would be a linear increase of concentration up to the opening of the window, after which (in a theoretical model) the virus concentration in the air would quickly drop to zero. The average virus concentration over this process would be half as much as the maximum concentration just before window opening. In consequence, the resulting average concentration is indirectly proportional to the air exchange rate: If the air is exchanged twice as often, the resulting average virus concentration is half as large. This also holds for continuous air exchange, e.g. by mechanical means.

1 Note that this is structurally the same as a continuous time dynamics with rate β and a limited infective period: The probability to not become infected during δt is 1 − β δt; if there are τ /δt subsequent such periods, that probability becomes (1 − β δt) τ /δt , and the probability to become infected thus becomes [7] (section V.4) 1 − lim δt→0 (1 − β δt) τ /δt = 1 − e −β τ . However, in our case τ only goes from the beginning to the end of the overlap, and is repeated every day again, given that both persons are still in the same states.

All of the above together replaces Eq. 2 by

where rs is the size of the room, and ae is the air exchange rate. That is, it sets

Again, the physical units are absorbed into Θ; note, however, that the air exchange rate ae is defined as exchanging air for the full room, and not in, say, cubic meters.

Estimation of room sizes As stated above, our data resolves down to the level of "facilities". These correspond roughly to buildings. In consequence, such a facility can be anything from a single family home to a large office building to a sports arena.

Since our simulation tracks when persons are at facilities, we can, for each facility, obtain the maximum number of persons at that facility, N personsAtFacility max , over the day. In addition, one can obtain typical floor space per person, fs, from regulatory norms and other sources (see Sec. 2.2) . This leads to

Since we divide all facilities by N spacesPerFacility , this leads for the room size to

recall that N spacesPerFacility = 1 for home activities.

Concrete numbers Overall, the above results in

which can be re-written as ci = 1

with the "normalized" contact intensity

See Table 1 for values of ci .

Although Eqs. (9) and (10) really mean the same thing as Eqs. (4) to (7), we find them more difficult to interpret. ci is the easy part -it parameterizes the "closeness" of the interaction. But why should that number be divided by N personsInRoom max , as Eq. (9) implies? The reason is that facilities/rooms that are used simultaneously by a larger number of people are also expected to be larger. And, from a personal perspective, if we share a room with one infectious other person, then our probability to become infected is, all other things being equal, indeed half as large if the room is twice as large. However, when it is twice as large, then there will presumably also be twice as many persons in it, doubling our own risk, and thus in the average cancelling out the effect of the larger room size. This second effect, however, is computed directly by our contact model (Sec. 2.1), and thus does not have to be included into the contact intensity. This has the additional advantage that if a person is in large container outside its peak usage, the model will calculate a much reduced infection probability. Examples for this are public transport vehicles, premises for large events, or restaurants. Children Current research implies that the susceptibility and infectivity are reduced for children compared to adults. We model this by including the susceptibility and infectivity into Eq. (1). For adults both parameters are set to one. For people below the age of twenty the infectivity is reduced to 0.85 and the susceptibility to 0.45 [39] . Note that this does not mean that the infection probability for children is necessarily lower than for adults because children are more likely to perform activities with a high contact intensity (as shown in Table 1 ; also see Sec. A.1.4 in the appendix).

The disease progression model is taken from the literature [40, 41, 42, 43, 44, 45] (also see [46] ). The model has states exposed, infectious, showing symptoms, seriously sick (= should be in hospital), critical (= needs intensive care), and recovered. The durations from one state to the next follow log-normal distributions; see Fig. 2 for details. We use the same age-dependent transition probabilities as [9] , shown in Tab. 2.

Infecting another person is possible during infectious, and while showing symptoms, but no longer than 4 days after becoming infectious. This models that persons are mostly infectious relatively early through the disease [41] , while in later stages the infection may move to the lung [42] , which makes it worse for the infected person, but seems to make it less infectious to other persons. 

This paper presents simulation results for the metropolitan area of Berlin in Germany, with approx. 5 million people. A typical simulation run looks as follows:

1. One or more exposed persons are introduced into the population.

2. At some point, exposed persons become infectious. From then on, every time they spend time together with some other person in a vehicle or at some activity, Eq. (1) is used to calculate the probability that the other person, if susceptible, can become infected (= exposed ). If infection happens, the newly infected person will follow the same progression.

3. Infectious persons eventually move on to other states, as described in Fig. 2 .

The model runs many days, until no more infections occur.

Most parameters of the model are taken from the literature, as explained in Sec. 2.3. The remaining free parameters are, from Eq. (1), Θ, sh, and in. We have set the base values of sh = in = 1. As mentioned in Section 2.2, we use these parameters to model the wearing of masks, meaning that they are reduced when masks are worn. The remaining Θ parameter was then calibrated so that in the base case, the number of cases doubles every three days. This is a plausible fit to the initial days which were totally without restrictions, which is what our base case represents. Calibrating Θ in our model corresponds to calibrating β, the infection rate, in a compartmental model. The result can be seen in Fig. 3 . 

As additional elements, we feed into the simulation the elements (a) disease import, (b) reductions in activity participation, (c) mask compliance during shopping and in public transport, and (d) the effects of outdoors vs. indoors season. All of these elements are obtained from data, so we do not need to guess them.

In addition, we add contact tracing followed by quarantine-at-home, since that is important for the behavior of the epidemics in Berlin in fall.

Unfortunately, it is not possible to add these aspects one by one, since every time one aspect is added, the value of Θ needs to be re-calibrated. We will therefore explain all four aspects in the following subsections, and then show model runs where all four elements are used for the model. Sensitivity tests, where each of these elements is individually removed, are provided in the appendix (Sec. A.1).

We take the disease import from abroad from data published by RKI ( [47] , always on tuesdays). Currently, for Germany this data is only available on a nationwide aggregated level. For this reason we scale it down to our Berlin model by using the population size. The data is dated on the reporting date and not on the actual date of becoming sick. Since the infection seeds are initiated into our model with the status exposed (see Fig. 2 ) and it can be assumed that the reporting date is significantly after the exposure date we date the data from RKI back by one week. The data provided by RKI is available as weekly values so we assign these values to the respective monday and then interpolate between them. The initially infected persons are drawn randomly from the population. The resulting disease import is shown in Fig. 4 .

Approach During the unfolding of the epidemics, people decided or were ordered to no longer participate in certain activities. We model this by removing an activity from a person's schedule, plus the travel to and from the activity. In consequence, that person no longer interacts with people at that activity location, and in consequence neither can infect other persons nor can become infected during that activity. Overall, this reduces contact options, and thus reduces epidemic spread. A very important consequence of our modelling approach is that we can take that reduction in activity participation from data. Unfortunately, the activity type detection algorithm is not very good for these unusual activity patterns, as one can see in Fig. 5 when knowing that all educational institutions were closed in Berlin after Mar/15. What is reliable, though, is the differentiation between at-home and out-of-home time, as displayed in Fig. 6 . One clearly notices that out-of-home activities are somewhat reduced after Mar/8, and dramatically reduced soon after. After some experimentation, it was decided to take weekly averages of the activity non-participation, and use that uniformly across all activity types in our model, except for educational activities, which were taken as ordered by the government.

To remove an activity with a certain probability, a random draw is made every time a synthetic person has that activity type in its plan. This means that the model assumes that, say for a 50% work reduction, there will be another 50% subset of persons at work every day. This intervention, in consequence, does not sever infection networks, but just slows down the dynamics. Behavioral interpretation of government interventions vs mobility data One striking consequence of the activity participation data (Fig. 6) is that, after the initial government intervention from Mar/7 that cancelled large events and raised awareness, the population reaction in fact preceded the government interventions, rather than the other way around. Out-of-home activity participation was already re- duced between Mar/7 and Mar/14, before the second government intervention that closed schools, clubs, and bars. Similarly, there was a further considerable drop between Mar/14 and Mar/21, again before the so-called contact ban (strongly reduced interaction between different households) and closing of all restaurants and non-essential stores in Germany. At least for Berlin and probably for Germany, it is thus not true that the government forced society and the economy to come to a halt; rather, society did this by itself, and the government presumably stabilized or reinforced behavior that was happening anyways.

In April the wearing of masks in shops and in public transport vehicles became obligatory in Berlin [48] . We have included this into the infection model of Eq. 1 by reducing sh (if the contagious person wears a mask) and in (if the person to be potentially infected wears a mask). This is dependent on the activity type, meaning that persons only wear masks when shopping or using public transport. The effectiveness of different mask types is taken from from [49] , i.e. cloth masks reduce shedding and intake to 0.6 and 0.5 of their original values, surgical masks to 0.3 and 0.3, and N95 (FFP2) masks to 0.15 and 0.025. The review article [50] comes up with about 0.05 for N95 masks, a factor of two larger, but still displaying a very large reduction. The same paper [50] also shows that "masks" without a specification of the type has much less of an effect. Finally, there may be the issue that lay people may not be able to use N95 masks at full efficiency. In consequence, any of our results that depend on N95 mask efficiency have to be interpreted "mechanically": They are plausible under the assumption that the fraction of people specified in the model is indeed able to use N95 masks effectively.

The local transport company in Berlin (BVG, [51] ) have provided us with the compliance rates in public transport over time meaning that we do not have to estimate them. We assume that the same compliance rates also apply to shopping activities. We assume that 90% of those people wearing masks wear cloth masks and 10% wear N95 masks.

The probability of getting infected during an encounter depends on whether the encounter takes place indoors or outdoors. Outside, the probability of infection is significantly reduced compared to inside. This is due to the fact that outdoors the air is constantly in motion and therefore aerosols cannot accumulate. We assume that an encounter outdoors decreases the infection probability by one magnitude [52, 31] .

In countries like Germany, seasonality has a great influence on how much time people spend outside. Based on the German time budget survey [53] and a survey on physical activities [54] 

The goal of contact tracing is to break chains of transmission by tracing the contacts of an infected person and putting these contacts into quarantine. In our model contacts are traced during all activities except for public transport and shopping because we assume that the health authorities are not able to find these contacts. A contact person is only traced when the contact duration is longer than 15 minutes, which corresponds to the RKI guidelines [55] .

Persons that go into showingSymptoms are assumed to trigger a contact tracing mechanism, which works as follows:

1. Look at all traced contacts that the infected person had in the 2 days [55] before showing symptoms. 2. A probability γ determines if a contact person can be reached successfully and also follows the stay-at-home order. γ is set to 0.6 3. The persons that have been traced successfully go into quarantine, but only after a delay of d days, which allows to model the response time of the system. Our base value of d is set to 4 days. Personal experience in our surroundings says that tests are normally taken a day after symptoms start, and the result is available again one day later in the evening. That is, contact tracing can start no earlier than 3 days after symptoms onset. We add another day to account for possible additional delays. 

The simulation is calibrated against the Berlin case numbers (Fig. 8) . COVID-19 is a notifiable disease, and the notifications are collected and published by the Robert Koch Institute (RKI) [58] . Each record contains at least two dates: The date when the record reaches the local health department (reporting date), and the date when symptoms started, called reference date. Fig. 10 ). -For more information see https://covid-sim.info/2020-11-03/sensitivityAggr.

In principle, the reference date would be easier to compare with our simulations, since it corresponds to the onset of our showingSymptoms state. Unfortunately, however, it is not clear how reliable that date is. The health department becomes aware of cases once they are tested positively. The positive test result becomes available about 2 days after the probe was taken. The health authorities thus have to connect a positive test with the person, and query the person about when symptoms started. Self-reported dates of symptoms onset are presumably rather unreliable, in part because of recall errors, in part because what a symptom is is not sharply defined. The reliability may be improved by using expert interviewers, but those may not always be available. In addition, when tests are taken from pre-or asymptomatic cases, a date of symptoms onset is not yet available, and for asymptomatic cases never will be. In such cases, the reporting data is also entered as reference date, which for pre-symptomatic cases is too early. Finally, many records are reported completely without this reference data. RKI provides a procedure to impute the missing reference date [59] , but has to rely on the statistical distribution of the cases where a reference date exists, which may not be a valid assumption since, say, locations that are under stress of high infection numbers may both not enter the reference date and receive the test results with additional delay.

In consequence, we decided to plot the case numbers both by reporting and by reference date for comparison. The result is shown in Fig. 8 (top) , where the blue line traces the number of new cases with state showingSymptoms from our simulation. Fig. 8 (bottom) shows the resulting reinfection rate R. To obtain that number, the reinfections caused by each synthetic person in the simulation are registered backwards to the date where that person turned contagious, and then averaged over all persons turning contagious on that date.

In terms of calibration, the initial growth is, within limits, insensitive against changes of Θ, since it is dominated by the disease import. This can be explained by the fact that the exponential growth was running ahead in other areas, and in consequence the share of infected persons from those areas also grew exponentially. Only after travel was stopped, disease import also stopped, and the dynamics in Berlin was dominated by internal processes. What is sensitive against Θ is the downward slope around April. In consequence, we adjust Θ such that the downward slope in the logarithmic plot is reproduced, given the activity reductions and mask compliance provided by our data. The result is also compared against hospital numbers (Fig. 9 ), which confirms our calibration.

The case numbers over the summer contain one large outbreak in a religious community, where we would claim that out model does not pick up such special cases, or at best as the possibility of large fluctuations in certain regimes. Otherwise, over the summer, the reinfection rate R was below one (Fig. 8 bottom) .

In the middle of July, the infection dynamics starts picking up again. The corresponding R, however, is mostly below one. Presumably, this is the consequence of the disease import (cf. Fig. 4 ) -note that our R is taken directly from the simulation, i.e. we register backwards at the date when a person turns contagious how many persons it will infect later. That is, there are increasing case numbers because there is disease import, but it does not go along with an R that is larger than one. 

Evidently, in our microscopic models we can track how many infections happen at which activity type. Fig. 10 shows, on top, the absolute numbers of infection types for the simulation, and below the share of infections per activity type over time. Initially, all activity types play a role. After the closure of the universities, schools, and day care in March, both their absolute numbers and their shares went to zero. At the same time, the infections share of work (pink) in April and May reflects that persons were drifting back to normal activity patterns (cf. Fig. 6 ). Leisure (orange) would have shown the same trend, but that was counter-acted by the increasing shift of activities to outdoors. In the bottom plot, the red line shows how the share of infections in public transit decreases significantly near the end of April because of increased wearing of masks. (Recall that we use observed mask compliance.) The shares in June cannot be interpreted because the absolute numbers are too low. In July we see how day care (blue) picks up, because it was re-opened. Schools re-open in the second week of August, and pick up accordingly (purple). Also, two weeks of school vacation in October are clearly reflected in the purple curve. From September on we then see a strong increase of the infections share of leisure activitiescorresponding to moving leisure activities from outdoors to indoors as explained in Sec. 3.2.4.

The results point to the importance of reducing infections during leisure activities if one wants to keep the dynamics under control during winter. After that, school and work have roughly equal weight. The following interventions would keep these under control:

• For work, it is suggested to either only work in single-person offices, or wear (N95) masks also at the workplace. According to our computations, this would reduce the contribution of work activities to the overall dynamics to no longer relevant.

• For school, it is suggested to combine (N95) masks with dividing classes by two and having them attend school only on alternating days. According to our computations, this would reduce the contribution of school activities to the overall dynamics to no longer relevant.

Germany has introduced restrictions for leisure activities starting Nov/2. In our mobility data, we accordingly see reductions, but they will at best reduce the share of the leisure infections by a factor of two. This means that it will remain the activity type with the largest share. The interventions for work and school have been recommended, but have not been implemented widely. Our current expectation [62] is that these measures in the leisure sector will be sufficient to stop the (exponential) growth, but they will not be enough to allow for a quick decline of the infection numbers. Additional measures will be needed to achieve that.

Intuition for these results In an older version of the model [63] , we had all contact intensities set to one. The contributions of each activity type to the infection dynamics then in first order corresponded to the average weekly time consumption in the respective activity. For example, averaged over the week, school consumes about 5 hours per day for persons going to school. However, since in Berlin only about 10% of the population are school children, 3 the average time consumption for the school activity is only 0.5 hours per day when taken across the whole population. In contrast, there are more persons going to work than to school, thus increasing the weight of work in the infection dynamics. The by far largest weight, however, comes from the leisure activities, which are not necessarily more hours per week for each individual person, but where all persons contribute to this type of time consumption. In consequence, restricting leisure activities has a large effect.

In the present model, this is now multiplied with the normalized contact intensities, cf. Tab. 1. In consequence, leisure, which already had a large share before, now gets even more weight. Work, despite occupying similar amounts of time, is weighted down because of the much smaller normalized contact intensity. On the other end of the scale, public transport has a high normalized contact intensity, but the times spent in public transport are considerably smaller.

A complicated case are schools and day care: They occupy large amounts of time, and have a large normalized contact intensity, both somewhat similar to leisure. In consequence, the re-opening of day care in July and of the schools in August should have had strong consequences in the infection numbers (Sec. A.1.4 in the appendix, in particular Fig. 14) . We took the observation that that did not happen as confirmation that their larger-than-average contact intensity is compensated for by a smaller-than-average infectivity and susceptibility (cf. Sec. 2.2). Clearly, this is specific to (our current understanding of) COVID-19. For other diseases, for example influenza, children may have a larger infectivity/susceptibility than adults, which then multiplied with their large contact intensity would lead to a large contribution to the infection dynamics. In consequence, these sub-models need to be understood and re-calibrated for each individual communicable disease.

Comparison to compartmental models Arguably, compartmental models are the mainstay of epidemiological modelling. Our approach, in contrast, follows individual synthetic persons. These individual persons can be enriched by person-centric attributes such as age or individual risk factors. Disease progression is individual, taking into account these demographic and other person-centric attributes. Similar to compartmental models, the base reinfection rate and the starting date need to be calibrated from case numbers. However, both the spatial and the social interactions in our model come directly from data. Also, behavioral reductions in activity participation come directly from data. Mechanical aspects such as the wearing of masks by certain persons and/or at certain activity types can be integrated very simply into the model, by reducing virus shedding, virus intake, or both. Travel in public transport is already integrated. Organizational suppression approaches, such as contact tracing, can be simulated mechanically, thus extracting information about the allowed delays between symptom onset and reaching contacts, the failure rate, etc.

We were able to bring this up quickly: Coding of the infection code was started at the end of Feb/2020; our first preprint is from 20/Mar/2020 [64] ; our first report to the government is from 8/Apr/2020 [19] ; we have reported to the government regularly since then 4 . Evidently, we were drawing from our experience and expertise with person-centric travel models. Still, it means that given the right experience and data availability, the method is not overly heavyweight, and then has many advantages over compartmental models.

The basic behavior of the model is like that of any S(E)IR model, i.e. exponential growth until a sufficient share of the population is immune, followed by exponential decline (cf. blue line in Fig. 8 ). Also the beginning and the speed of the growth are calibrated in similar ways. In most models, however, interventions such as reductions in out-of-home activity participation, masks, or contact tracing, need to be parametrized into parameter changes of the S(E)IR model, most notably the infection rate [65, 66, 67, 68] . The only models that use human activity patterns directly that we are aware of are the three models described in [8] . Out of these, we are aware of an application to COVID only by the Imperial College model [9] . Their results are roughly in line with ours. That model, at the time, used a doubling of cases every 10 days; reality, with a doubling every 3 days, was possibly even more dramatic than their predictions. However, their model was purely predictive, i.e. other than us they did not use mobility data to gauge the actual reductions in activity participation.

Under-reporting A known issue with epidemiological data and thus the simulations that build on it is the issue of under-reporting, i.e. that there are more cases in reality than are in the data. For our model, this would imply to "raise" the curve of infections, e.g. in Fig. 8 , to a higher level. In order to achieve this, we could, for example, feed the model with initial seeds at some earlier point in time. This would, however, lead to a lower slope in the log-plot for the early days, in contrast to the data, and thus does not seem plausible. An alternative would be to assume that the disease import which drives our initial phase, Fig. 4 , is itself under-reported. This would be entirely plausible. This would also increase the hospital numbers, Fig. 9 , which would make the curve of seriouslySick more realistic, but make the curve of critical less realistic. Also, other studies point to relatively little under-reporting in Germany [69, 70] . As long as the number of sero-positive persons in Germany remains in the single-digit percentage ranges [70] , the simulation is not strongly affected by this issue.

Predictions The model is used for predictions. We decided to not add them into the paper since any prediction we make now would be historical quickly. Our regular reports to the government, and thus our predictions, all have a DOI, for example [19] or [62] . 5 

We combine a person-centric human mobility model with a mechanical model of infection and a personcentric disease progression model into an epidemiological simulation model. Different from other models, we take the movements of the persons, including the intervening activities where they can interact with other people, directly from data. For privacy reasons, we rely on a process that takes the original mobile phone data, extracts statistical properties, and then synthesizes movement trajectories from the statistical properties; one could use the original mobile phone trajectories directly if they were available. The model is used to replay the epidemics in Berlin. This allows important insights into the societal transmission from government actions to mobility behavior to infection dynamics. Importantly, it turns out that the population started reducing its out-of-home activities before the government asked/ordered the population to do so. The model is then used to evaluate different intervention strategies, such as closing educational facilities, reducing other out-of-home activities, wearing masks, or contact tracing, and to determine differentiated percentage changes of the reinfection number R per intervention.

For computer code see https://github.com/matsim-org/matsim-episim. Simulations were computed with version 0683bd27d80963e16af95395790959ae5e1578a0 of the code, started with command java -jar matsim-episim-1.0-SNAPSHOT.jar runParallel \ --setup org.matsim.run.batch.BerlinSensitivityRuns \ --params org.matsim.run.batch.BerlinSensitivityRuns$Params

Some of the input data, in particular the synthetic mobility traces, are currently under a license, but we are in the final stages of negotiations with the data provider and are confident that they can be made available soon.

The output data used for the figures can be retrieved at https://svn.vsp.tu-berlin.de/repos/ public-svn/matsim/scenarios/countries/de/episim/battery/2020-11-03/sensitivity/. Fig. 12 shows the consequence of wearing masks in public transport and during shopping. There is a noticeable difference to Fig. 8 , but the difference is not huge. Figure 12 : Model without masks during shopping/public transport. The calibration parameter Θ is not recalibrated. Fig. 13 shows the consequence of not doing contact tracing. In consequence, the infection numbers start growing already in June, and at the beginning of November would be considerably larger than they actually are. One also notices that there is now a uniform slope in the logarithmic plot from July to November, while with contact tracing (Fig. 8) one clearly sees the increase in slope in October when the contact tracing becomes overwhelmed. Schools have re-opened on Aug/8; not applying these two reductions results in an infection dynamics that is much stronger than what was observed in reality. 

We conclude that, given current knowledge, all the elements of our model are necessary to track the development so far. 

Containing Papers of a Mathematical and Physical Character

Population biology of infectious diseases: Part I

Human mobility networks, travel restrictions, and the global spread of 2009 H1N1 pandemic

Natural Human Mobility Patterns and Spatial Spread of Infectious Diseases

Effective distances for epidemics spreading on complex networks

Evolution and emergence of infectious diseases in theoretical and real-world networks

Epidemic processes in complex networks

Modeling targeted layered containment of an influenza pandemic in the United States

Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand

Modelling disease outbreaks in realistic urban social networks

FluTE, a publicly available stochastic influenza epidemic simulation model

Forecast and control of epidemics in a globalized world

SABCoM: A Spatial Agent-Based COVID-19 Model

Modeling latent infection transmissions through biosocial stochastic dynamics. medRxiv

A mechanistic model of infection: why duration and intensity of contacts should be included in models of disease spread

Reconstructing the 2003/2004 H3N2 influenza epidemic in Switzerland with a spatially explicit, individual-based model

Epidemic Spreading in Urban Areas Using Agent-Based Transportation Models

Mobility traces and spreading of COVID-19

MODUS-COVID Vorhersage vom 8.4.2020. Technische Universität Berlin

Eine ereignisorientierte Simulation von Aktivitätsketten zur Parkstandswahl. {Schriftenreihe des Instituts für Verkehrswesen der Universität Karlsruhe}

Demonstration of an activitybased model for Portland

Modeling Week Activity Schedules for Travel Demand Models

The Multi-Agent Transport Simulation MATSim

Assessing the use of mobile phone data to describe recurrent mobility patterns in spatial epidemic models

The Senozon Mobility Model

Mobility Pattern Recognition (MPR) und Anonymisierung von Mobilfunkdaten

Privathaushalte nach Haushaltsgröße im Zeitvergleich

Infection fatality rate of SARS-CoV-2 infection in a German community with a super-spreading event

Models of epidemics: how contact characteristics shape the spread of infectious diseases

FAQs on Protecting Yourself from aerosol transmission

Parameter study for risk assessment in internal spaces regarding aerosols loaded with virus

Senatsverwaltung für Integration AUS

DGUV Information 202-090 -Klasse(n) -Räume für Schulen Empfehlungen für gesund... -Schriften -arbeitssicherheit.de

Planung und Bau von Küchen und Kantinen für 50 bis 1000 Verpflegungsteilnehmer

Verordnungüber Arbeitsstätten

Technische Regeln für Arbeitsstätten ASR A3.6. Ausschuss für Arbeitsstätten

DIN Deutsches Institut für Normung. DIN EN 16798-1

The role of children in the spread of COVID-19: Using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children

WHO. Report of the WHO-China Joint Mission on Coronavirus Disease

Temporal dynamics in viral shedding and transmissibility of COVID-19

Virological assessment of hospitalized patients with COVID-2019

Charakteristik von 50 hospitalisierten COVID-19-Patienten mit und ohne ARDS

Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China

RKI -SARS-CoV-2 Steckbrief zur Coronavirus-Krankheit-2019 (COVID-19)

COVID-19 infectivity profile correction

Aktueller Lage-/Situationsbericht des RKI zu COVID-19

COVID-19-Pandemie in Berlin

To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic

Physical distancing, face masks, and eye protection to prevent person-to-person transmission of SARS-CoV-2 and COVID-19: a systematic review and meta-analysis

Willkommen bei den Berliner Verkehrsbetrieben -BVG

Closed environments facilitate secondary transmission of coronavirus disease 2019 (COVID-19)

Ausübung von Sport im Freien in Deutschland

Kontaktpersonen-Nachverfolgung bei respiratorischen Erkrankungen durch das Coronavirus SARS-CoV-2; 2020

Bund und Länder einigen sich auf weitreichende Lockerungen

COVID-19-Dashboard

Schätzung der aktuellen Entwicklung der SARS-CoV-2-Epidemie in Deutschland -Nowcasting

Anstieg bei den Infizierten lässt nach -aber nicht der bei den Todesfällen. Tagesspiegel; 2020

MODUS-COVID Bericht vom 13.11.2020. Technische Universität Berlin

Using mobile phone data for epidemiological simulations of lockdowns: government interventions, behavioral changes, and resulting changes of reinfections

Mobility traces and spreading of COVID-19

Inferring change points in the spread of COVID-19 reveals the effectiveness of interventions

COVID-19 Scenarios

Real-time modeling and projections of the COVID-19 epidemic in Switzerland

Wie kann man mit Statistik die Dunkelziffer der Corona-Infektionen bestimmen?

Serologische Untersuchungen von Blutspenden auf Antikörper gegen SARS-CoV-2 (SeBluCo-Studie)

We thank Kai Martins-Turner and Dominik Ziemke for discussions. We are grateful to BVG (Berlin public transit operator) for providing the mask compliance rates which they surveyed on a daily basis. The work on the paper was funded by the Ministry of research and education (BMBF) Germany (01KX2022A) and TU Berlin; regular reports can be found through this search: https://depositonce. tu-berlin.de/simple-search?query=modus-covid. Zuse Institute Berlin (ZIB) provided CPU time.

This section discusses what happens when certain elements of the model are switched off.A.1.1 Disease import Fig. 11 (top) shows what would happen if disease import was not included into the model. Since now the initial growth of the epidemics is too slow, the calibration parameter Θ is re-calibrated so that the initial growth is reproduced correctly (Fig. 11 bottom) . One clearly notices that this results in much more infection activity in the model than in reality. That is, according to our model the fast growth in March in Berlin was partially caused by disease import; without disease import, we need to use a Θ and thus a re-infection rate that is too high for what follows afterwards. Figure 11 : Model where the time-dependent disease import (Fig. 4) is replaced with a constant disease import (4 persons per day = 1 person per day in 25% sample). TOP: The infection numbers come out too low. BOTTOM: After re-calibrating Θ to the initial growth in March, the infection numbers afterwards come out too high.