key: cord-0303132-pui29w9j authors: Petala, M.; Kostglou, M.; Karapantsios, T.; Dovas, C.; Lytras, T.; Paraskevis, D.; Roilides, E.; Koutsolioutsou-Benaki, A.; Panagiotakopoulos, G.; Sypsa, V.; Metallidis, S.; Papa, A.; Stylianidis, E.; Papadopoulos, A.; Tsiodras, S.; Papaioannou, N. title: Relating SARS-CoV-2 shedding rate in wastewater to daily positive tests data: A consistent model based approach date: 2021-07-06 journal: nan DOI: 10.1101/2021.07.04.21259903 sha: db3d661c0a9ad99ea31d9a15b24e4b68694755bf doc_id: 303132 cord_uid: pui29w9j During the COVID-19 pandemic, wastewater-based epidemiology (WBE) has been engaged to complement medical surveillance and in some cases to also act as an early diagnosis indicator of viral spreading in the community. Most efforts worldwide by the scientific community and commercial companies focused on the formulation of protocols for SARS CoV-2 analysis in wastewater and approaches addressing the quantitative relationship between WBE and medical surveillance are lacking. In the present study, a mathematical model is developed which uses as input the number of daily positive medical tests together with the highly non-linear shedding rate curve of individuals to estimate the evolution of virus shedding rate in wastewater along calendar days. A comprehensive parametric study by the model using medical surveillance data for the city of Thessaloniki (~700,000 inhabitants, North Greece) reveals the conditions under which WBE can be used as an early warning tool for predicting pandemic outbreaks. It is shown that early warning capacity is not the same along the days of an outbreak and varies strongly with the number of days apart between the day of maximum shedding rate of individuals in their disease cycle and the day of their medical testing. Moreover, the present data show that there exists a proportion between unreported cases (asymptomatic persons and patients with mild symptoms that do not seek medical advice) and reported cases. The proportion is not steady but increases with the number of reported cases. The early detection capacity of WBE improves substantially in the presence of an increasing number of unreported cases. For Thessaloniki at the peak of the pandemic in mid-November 2020, the number of unreported cases reached a maximum around 4 times the number of reported people. indicate a time delay up to 8 days between wastewater signal and infected cases reported in public health surveillance system (e.g. (D'Aoust et al., 2021; Nemudryi et al., 2020; Peccia et al., 2020) ). This is in line with the reported incubation period of around 6 days from exposure/infection to onset of symptoms of COVID-19 Lauer et al., 2020; Li et al., 2020) . Therefore, the discrepancy between the onset of viral shedding and the onset of symptoms is actually an advantage that might explain the early detection capacity of WBE. Another important reason of discrepancy between daily reports of wastewater viral titers and daily reports of new confirmed cases is the cumulative character of viral shedding rate in wastewater during the disease. Several clinical studies 14-16 showed that viral shedding is not uniform during the disease, but there is a maximum shedding rate in the very acute phase of infection, probably even before respiratory symptoms appear, being followed by an exponential decline in subsequent days. If one considers that shedding lasts at least 3-4 weeks after the inception of symptoms 14,15. it is apparent that daily wastewater viral titers correspond more to the cumulative shedding of infected people rather than to the shedding of the daily new cases. It must be stressed here that the evidence from clinical studies is limited, and only for samples collected from hospitalized patients, thus after symptoms onset, so one should be extremely careful in estimating shedding rates at the first days right after infection. If one aims to associate wastewater surveillance data with public health surveillance data, the different sources of bias between these data should be also taken into account. Laboratory practice shows that variation in wastewater data is considerably high from one day to the next because of the many experimental parameters that are involved, e.g., individual's shedding rate (orders of magnitude variation among infected people ; sampling protocol, sample concentration techniques, RNA extraction methodology, normalization with respect to population indicators and flow rate, inhibition assessment in RNA recovery, RNA quantification, etc. On the other hand, public health surveillance is susceptible to a systematic bias because of limitations in the capacity of medical testing along with the unpredictable human behavior under stressful conditions. In principle, one should be concerned about the correctness of both wastewater viral measurements and clinical testing data. The goal of this work is to setup a methodology for the consistent comparison between the number of infected people reported by clinical surveillance and SARS-CoV-2 RNA concentration in wastewater samples. To do so, a model is developed for the estimation of the evolution of virus shedding rate in a sewage system based on the daily positive medical tests. Apart from corrections based on the laboratory controls suggested by CDC, virus shedding rate data are further rationalized with respect to physicochemical parameters of wastewater to account for virus loss by adsorption to sewage solids (Petala et al., 2021) . To increase accuracy, clinical testing data are based on the date of specimen collection and not on the reporting date. The model is first developed in a generalized continuous time regime and then it is transformed to its discrete counterpart dictated by the daily basis data. Wastewater samples were collected at the exit of Thessaloniki's (a city at North Greece) main sewerage pipe, right before the entrance at the wastewater treatment plant of the city, as described elsewhere (Petala et al., 2021) . This plant serves an estimated population of about 700,000 people. The present work reports 24-hours composite samples (1L each) taken three times per week (Monday-Wednesday-Friday) from October 5 th , 2020 until January 6 th , 2021. This period includes the second wave burst of COVID-19 for Thessaloniki in November 2020 which was the worst ever in the whole country. Typical range of values of the physicochemical parameters of wastewater samples for the examined period are displayed in Table 1 (supplementary materials) . These parameters were employed in the rationalization of the measured viral concentration with respect to quality characteristics of wastewater according to Petala et al. (Petala et al., 2021) . Upon wastewater collection, 200 mL of each wastewater sample was centrifuged at 4000xg for 30 min to remove particles and pH of the supernatant was adjusted to 4 using a solution of 2.0 M HCl. An aliquot of 40 mL supernatant was then passed through an electronegatively charged surface of 0.45 μm-Ø47 mm cellulose nitrate HA membrane ((HAWP04700; Merck Millipore Ltd., Tullagreen, Ireland). Filtration was performed using a magnetic funnel mounted on a glass filtration flask (Pall Corporation) Filtration step was conducted in triplicate and membranes were stored in 15 mL falcon tubes for further processing of RNA extraction and virus quantification. For each sewage sample, three electronegative membranes were individually subjected to phenolchloroform-based RNA extraction process (Chaintoutis et al., 2019) coupled with magnetic bead binding. Each membrane filter was rolled into a Falcon™ 15 mL conical centrifuge tube with the top side facing inward. The following were added sequentially, a) 900 μL of guanidinium isothiocyanate-based "Lysis buffer I" [5M guanidinium isothiocyanate, 25mM EDTA, 25mM sodium citrate (pH 7.0), 25mM phosphate buffer (pH 6.6)] containing 1% N-Lauroylsarcosine, 2% Triton X-100, 2% CTAB and 2% PVP, b) 18μl βmercaptoethanol, c) 300μl H2O, mixed thoroughly by inversion, and each tube was incubated at 4 °C on horizontal rotator (50 rpm) for 10-30 min. 1200 μl of "Lysis buffer II" were added [prepared by mixing 152.5gr guanidinium hydrochloride, 31.25 ml of 2M acetate buffer (pH 3.8) and water-saturated phenol stabilized (pH4), up to 500ml final volume], followed by incubation 10 min / RT on horizontal rotator (150 rpm). The liquid phase was transferred on a 2-mL microcentrifuge tube and clarified by centrifugation (21,000×g, 5 min, 4 °C). 1600 μl were transferred to a new tube, 200 μl chloroform-isoamyl alcohol (24:1) were added and shaken vigorously for 30 s followed by incubation (−20 °C, 30 min) and centrifugation (21,000×g, 10 min, 4 °C). 800 μl of the upper aqueous phase were transferred and mixed with 667μl isopropanol and 20μl of magnetic beads (IDEXX Water DNA/RNA Magnetic Bead Kit), followed by incubation 15 min / RT on horizontal rotator (150 rpm). The beads were washed according to the manufacturer's protocol, using the magnetic extraction reagents. Elution of RNA was done in 100 μl buffer followed by filtering using OneStep PCR inhibitor removal kit (Zymoresearch). SARS-CoV-2 quantified RNA (1x 106.3 genomic copies) from human clinical samples was added to a subset of concentrates to estimate the recovery efficiency and reproducibility of the RNA extraction procedure. Similarly, heat inactivated SARS-CoV-2 from human clinical samples was spiked (1x 107 viral particles) in a subset of sewage samples to assess the recovery and reproducibility of the virus concentration procedure. Each RNA extract was subjected to real-time RT-PCR for SARS-CoV-2 quantification in triplicates. Two primer/probe sets were utilized: the N2 set from CDC that targets the nucleocapsid (N) gene and the set targeting the genomic region that encodes the E protein . The assays were performed . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 6, 2021. ; on a CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad Laboratories, Hercules, CA, USA). Reactions were considered positive if the cycle threshold was below 40 cycles. Calibration curves were generated using the synthetic single-stranded RNA standard "EURM-019" (Joint Research Centre, European Commission). The possible presence of RT-PCR inhibitors in each RNA extract was assessed in duplicates using spiked EURM-019 included at ~1000 copies in additional RT-PCR reactions. Inhibition was expressed as % reduction in reported copy number, compared to the sum of spiked EURM-019 copies and the mean value of measured SARS-CoV-2 genomic copies in the non-spiked RNA. SARS-CoV-2 viral load in each sewage sample was expressed as mean ± standard deviation genome copies per liter, after correcting for RT-PCR inhibition if present, and recovery efficiencies (virus concentration and RNA extraction). Precision of each individual sewage sample quantification was assessed using the coefficient of variation (CV) of the estimated SARS-CoV-2 viral load of the three electronegative membranes processed. We set our precision threshold at 35% CV for each sewage sample measurement. Daily numbers of COVID-19 infected people reported in the city of Thessaloniki, adjusted to specimen collection date, were obtained from the National Public Health Organization. These data reflect residents found positive when tested in public and private laboratories. Data are presented for the period between September 1 st , 2020 and January 6 th , 2021 (original data are provided in supplementary materials). A model is developed for estimating the evolution of virus shedding rate to sewage system based on the official (announced by the state) results of daily positive tests (cases), registered by the date of specimen collection and not by the date of public reporting. There is typically a delay of few days between specimen collection and public reporting. The model is first developed in a generalized continuous time regime and then it is transformed to its discrete counterpart dictated by the daily basis data. Let us denote as f(t) (t: calendar day) the evolution of positive test counts density with f(t)dt being the number of positive tests in the time period between t and t+dt. The first step is to transform the function f(t) to the function F(t) which denotes the total number of reported infected people at time t. It is essential to stress here that F(t) includes people that at time t either had already a positive test or they are in the first days after infection, so do not have symptoms, but still they shed virus and will be tested positive later after their onset of symptoms. The effect of infected but unreported people, such as asymptomatic ones and those not tested because of mild symptoms, is dealt with later. In order to proceed let us consider the course of the disease of an infected person: Infection starts (disease incidence) at day τ=1, detection occurs (specimen collection) at day τ=τd and end of viral shedding occurs at day τe. The number of days for detection and end of shedding are in general not the same among cases but they show a dispersion which can be described by the respective probability density functions Pd(τd) and Pe(τe). It is understood that both functions take non-zero values only in restricted domains of their arguments. The limits of these domains are denoted as τd1, τd2 and τe1, τe2, respectively. The functions Pd and Pe satisfy the conditions: . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 6, 2021. This relation is actually the mathematical expression of the statement that infected persons at time t include those already detected and those to be detected in the next days. In principle, a multiplication of this number by the average shedding rate per person would give the required global shedding rate evolution function R(t). Such averaging over all infected individuals would have been sufficient for data reduction purposes only if the shedding rate of individuals during their disease cycle had a constant value. However, as already explained, this is not true since there is a strong variation in virus concentration in stool not only among infected individuals but also across the days of their disease cycle. Let us denote this variable function of the daily shedding rate per person as S(τ), (τ: day of the disease onset). It has been shown that not only S is not a constant, but on the contrary, it is a strong function of its argument τ. In order to incorporate the effect of function S(τ) to the global shedding rate R(t), knowledge of the distribution of disease days among the infected population is required. The corresponding function is denoted as F*(τ,t) and represents the density function of the distribution of disease days, τ, at time t. It is fruitful to decompose the function F*(τ,t) into the product of the total number of infected people and the probability density function of the days of infection as F*(τ,t)=F(t)g(t,τ) where g(t,τ) is a probability density function with respect to τ and satisfies the condition: It can be shown that the function g can be derived as . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 6, 2021. where U is a step function taking the values 0 for negative and 1 for positive argument. The domain of definition of the above expression is from t=τe2-τd1 to t=T-τd2 in the case of f(t) known from t=1 to t=T. In order to proceed, one should know the function of virus shedding rate in stool per person and per day of the disease. A first question is how the onset of infection (i.e., time τ=1) is defined. There are several possibilities for this definition but in the present context the most relevant one is that τ=1 day is the first day of non-zero shedding of virus in the stool of a person. There are only a few available clinical studies in literature (e.g. Wölfel et al. (Wölfel et al., 2020) , Huang et al. (Huang et al., 2020) ; Tan et al. ) referring to the kinetics of SARS-CoV-2 shedding rate in stool during the course of disease (from τ=1 to τe). To our best knowledge, from those studies only the work of Wölfel et al. (Wölfel et al., 2020) reports actual viral concentrations whereas all other studies refer to Ct values from molecular analysis. It must be mentioned that these studies present data exclusively from hospitalized patients, thus from people presenting moderate to severe symptoms. Apparently, it is not easy to collect stool samples from infected persons prior the onset of symptoms, and as a result, asymptomatic or mild cases are not registered in clinical studies. A first common observation in these studies is that virus concentration in stool among infected persons vary by several orders of magnitude. Therefore, here an average shedding function per person will be employed and the significance of variations among individuals will be discussed on statistical terms. Interestingly, many past studies assumed in addition a uniform with respect to time shedding rate despite the orders of magnitude variability across the course of the disease (Ahmed et al., 2020; Gonzalez et al., 2020; Medema et al., 2020a; Saththasivam et al., 2021) . This assumption simplifies the algebra of the problem enormously but it is incorrect. A thorough fitting procedure of the virus titers data in stool of Wölfel et al. (Wölfel et al., 2020) , led to a peculiar two-steps model (Miura et al., 2021) : a first step with zero viral shedding, where virus load simply accumulates in infected hosts up to a maximum concentration reached at the day of symptoms onset, and a subsequent second step characterized by viral shedding at an exponentially decreasing concentration over the days of the disease. This fitting yields a Gamma function which degenerates to an exponential one. Evidently, it is quite arbitrary to assume that there is an initial viral accumulation period up to a maximum concentration without any shedding at all. Here, an even more general parameterization is introduced. The average over the infected persons function is represented as a product of the following factors: (1) the average stool amount produced by a person per day, A (gstool/day), (2) the maximum in time (averaged over infected individuals) virus concentration in stool, B (gvirus/gstool) and (3) a time distribution function s(τ) of this concentration. This distribution is assumed to consist of an exponential increase from a minimum initial value up to a maximum value being followed by an exponential decrease down to a minimum final value. These . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 6, 2021. minimum initial and final values of the distribution are assumed to be 1% of the maximum value. As a result, the whole distribution spans shedding rates over two orders of magnitude. This assumption may be changed to any other option, e.g., to 1‰ (three orders of magnitude span), but the approach remains the same. However, a two orders of magnitude span represents adequately most of the shedding of infected individuals during a typical course of the disease, so it is adopted herein. A new parameter is introduced, τa, which denotes the value of τ days at which the maximum virus concentration in stool, i.e. maximum shedding rate, occurs. Summarizing, the above the function S(τ;τa,τe) is given as: The shape of the function s(τ) for several pairs of (τa,τe) -(6,32), (12,32), (9,25) -is shown in Figure 1 . The particular parametrization of the function S proposed in the present work is more realistic than the Miura et al. (Miura et al., 2021) model, as it incorporates viral shedding even in the early days of the disease and before the peak maximum (peak) viral concentration in stool is reached. Such early shedding in stool is in line with clinical studies reporting patients with shedding in oropharyngeal swabs and gastrointestinal symptoms (Siegel et al., 2020) a few days before the appearance of respiratory symptoms. Nevertheless, one should keep in mind that respiratory shedding is only a proxy for fecal shedding. In addition, the proposed function S is very attractive because there is one to one correspondence between the parameters and major features of the function. The height (amplitude) of S is determined by the product AB, its time length is determined by the parameter τe and its skewness by the parameter τa. In particular, the closer τa is to τe/2 the smaller the skewness is. It must be stressed that the proposed parametrization of S does not require explicit information on the duration of the incubation period after infection nor on the day of symptoms onset. What actually matters in the context of the present analysis is the day of the maximum shedding rate as this is described by the parameter τa. Some clinical studies indicated that this maximum might occurs at the end of the incubation period which is taken also as the day of symptom onset (Huang et al., 2020; Miura et al., 2021; Tan et al., 2020) . However, in the present formulation this day is flexible and can be anytime from the moment of infection to the end of the shedding period (i.e., the active days of the disease). The development up to now is rather complex and includes several unknown probability density functions. In the absence of any information about them, it is convenient to consider them as Dirac delta functions. In this way, we assign to each distribution its average value. This approach not only simplifies considerably the mathematical problem but it also allows -through sensitivity analysis-to assess bounds on the effect of using a different distribution than the Dirac delta function. This is based on the principle that any secondary feature of a distribution has much smaller effect on the result than its average value. By considering the relations Pd(τd)=δ(τd-τd,av) and Pe(τe)=δ(τe-τe,av) (where δ denotes the Dirac delta function) and substituting them in equations (2), (4), (7) leads after some algebra to (the subscript "av" is dropped in the following for clarity): . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 6, 2021. ; https://doi.org/10.1101/2021.07.04.21259903 doi: medRxiv preprint The above set of equations is discretized to be compatible with the present data. These data refer to daily values of virus shedding rate so a finite volume discretization is followed here. Let us denote as fi the number of positive medical tests at day i (measurement period i=1 to N), Fi is the total number of infected people (that is daily cases) at day i (with the specimen of their positive test collected at day i), Ri the shedding rate at day i and gi,j the probability of being at the j-th day of the disease at the calendar day i. The governing equations take the form: for i>m and i