key: cord-0276009-ft9qcyf8 authors: van Boven, M.; Hetebrij, W. A.; Swart, A. M.; Nagelkerke, E.; van der Beek, R. F.; Stouten, S.; Hoogeveen, R. T.; Miura, F.; Kloosterman, A.; van der Drift, A.-M. R.; Welling, A.; Lodder, W. J.; de Roda Husman, A. M. title: Modelling patterns of SARS-CoV-2 circulation in the Netherlands, August 2020-February 2022, revealed by a nationwide sewage surveillance program date: 2022-05-30 journal: nan DOI: 10.1101/2022.05.25.22275569 sha: 73e0576c80e2765fb20f57662ba41f59e9e3868a doc_id: 276009 cord_uid: ft9qcyf8 SUMMARY Background Surveillance of SARS-CoV-2 in wastewater offers an unbiased and near real-time tool to track circulation of SARS-CoV-2 at a local scale, next to other epidemic indicators such as hospital admissions and test data. However, individual measurements of SARS-CoV-2 in sewage are noisy, inherently variable, and can be left-censored. Aim We aimed to infer latent virus loads in a comprehensive sewage surveillance program that includes all sewage treatment plants (STPs) in the Netherlands and covers 99.6% of the Dutch population. Methods A multilevel Bayesian penalized spline model was developed and applied to estimate time- and STP-specific virus loads based on water flow adjusted SARS-CoV-2 qRT-PCR data from 1-4 sewage samples per week for each of the >300 STPs. Results The model provided an adequate fit to the data and captured the epidemic upsurges and downturns in the Netherlands, despite substantial day-to-day measurement variation. Estimated STP virus loads varied by more than two orders of magnitude, from approximately 1012 (virus particles per 100,000 persons per day) in the epidemic trough in August 2020 to almost 1015 in many STPs in January 2022. Epidemics at the local levels were slightly shifted between STPs and municipalities, which resulted in less pronounced peaks and troughs at the national level. Conclusion Although substantial day-to-day variation is observed in virus load measurements, wastewater-based surveillance of SARS-CoV-2 can track long-term epidemic progression at a local scale in near real-time, especially at high sampling frequency. The SARS-CoV-2 pandemic has posed an unprecedented threat to public health in modern times. The virus can cause the respiratory and systemic disease COVID-19, but in general severity and progression of the disease are variable [1] , depending on host risk factors [2] and pathogen variants [3] . It is known that transmission can also occur via asymptomatic or pre-symptomatic individuals [4] . These characteristics hamper effective epidemiological surveillance for the infection dynamics, as case notifications are biased, depending on strain-dependent severity of disease, willingness to get tested, testing capacity, and public health policies. Epidemiological surveillance based on hospital admissions is less biased, but it does not provide an accurate picture in the early and late stages of an epidemic when hospitalisations are rare. This is especially true at a local scale (e.g., municipalities) and for variants causing relatively mild disease (such as the Omicron variant relative to the Delta variant). Serological surveillance for SARS-CoV-2 can provide an unbiased population-level estimate of the fraction of the population that has been infected [5] . Serological surveys, however, are costly and each survey only provides a cross-sectional snapshot of the population at a single time point. In contrast with earlier applications of wastewater-based surveillance such as the identification of emerging enteroviruses [6] , monitoring of SARS-CoV-2 in sewerage can supply quantitative information at a local scale [7] . In the COVID-19 pandemic, the detection of SARS-CoV-2 in feces [8] has accelerated the introduction of wastewater surveillance in a variety of settings, including airports, hospitals, and cities. Wastewater-based surveillance has now been implemented in many countries globally [9] . However, many of these activities are carried out by local governmental bodies or independent research groups and are often restricted to specific locations. This makes it difficult to compare variations over time and geographically in a broader perspective. Here we report data and analyses from a national wastewater-based surveillance program in the Netherlands. Since September 7 th , 2020, all sewage treatment plants (STPs) in the Netherlands provide 1 to 4 samples per week to the Dutch National Institute for Public . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Health and the Environment. Since almost every Dutch household is connected to a sewer, coverage of the program is close to 100%. Samples are subjected to a standardised extraction protocol and analysed within 2-4 days after sampling by real-time RT-PCR. As individual measurements of SARS-CoV-2 RNA in sewage are noisy, inherently variable, and can be left-censored (i.e., not able to detect RNA at low virus loads), our aim is to integrate the available data to provide estimates of the latent true SARS-CoV-2 virus loads over time for all STPs. We provide an integrated analysis of the data from August 1, 2020 (when 60% of the population had already been included) up to and including February 8, 2022 (when more than 99% of the population had been included). Hence, the time series data cover the winter epidemic of 2020-2021, the 2021 summer surge after many restrictions had been lifted, and the 2021-2022 winter epidemic. Our analyses employ a multilevel Bayesian model with flexible smoothing functions (splines) describing virus load trends at the level of STPs, enabling information sharing both within and between STP time series. We discuss how sewage surveillance could help anticipate and prepare for future epidemics of SARS-CoV-2. In the Netherlands, sewage of more than 99% of the population is treated at 317 STPs, of which 4 closed between start and end of the period of analysis ( Figure 1 ). Coverage of the program is over 99% of the Dutch population (>17 million) as only a small fraction of the population in the Netherlands is not connected to a sewage system. STPs vary in size from covering approximately 1,000 to over 800,000 inhabitants. Sewage samples are collected using a 24h flow proportional sampling device from which 500-1000 mL is sent to the laboratory. Quantification of SARS-CoV-2 RNA in the samples was based on a standard curve. Details of sampling and experimental procedures will be provided elsewhere (Lodder et al, in preparation) . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) We analyse log 10 -transformed virus concentration data, normalised by 24h sewage water flow rates (unit: virus particles per 100,000 persons per day; henceforth called virus load) [10] using a Bayesian multilevel spline model with random effects at the level of the STPs [11] [12] [13] . In the model, the expected log 10 -transformed virus load at time and STP , , is given by where represents the national trend, and represent the STP-specific deviations. The national trend is modelled with a penalised spline (p-spline) with cubic b-spline basis functions and 15 equidistant knots (yielding 17 regression coefficients). Results presented here are obtained with a first-order random walk (RW1) prior for the regression coefficients [12] , using a 12,1 prior distribution for the first regression coefficient and Inverse-Gamma(1, 0.0005) prior distribution for the variance parameter [14] . In this manner, the national log 10 -transformed virus load at the first time point is normally distributed a priori with a mean of 12 (per 100,000 persons per day) and approximate 95% prior range of 10-14 and with zero-mean normal prior distributions ( 0,1 ). Hence, plant-specific deviations are a priori expected to be no more than approximately 100-fold (i.e., 2 standard deviations) lower or higher than the trend. We also explored a suite of alternatives, e.g., estimating the variance of the deviations, or replacing the p-spline for the national virus load with a bspline using 13,1 prior distributions for the regression coefficient. Throughout, we assume that the log 10 -transformed virus load measurements at plant and time are normally distributed ( ~ , ), and we use an uninformative (improper uniform) prior distribution for 0. The probability of virus detection in a sample is determined by the detection limit of the qRT-PCR, the daily flow rates of water through the sewerage, and possibly also by the composition of the sewage. Hence, there is no fixed predefined cut-off for RNA detection . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 30, 2022. ; https://doi.org/10.1101/2022.05.25.22275569 doi: medRxiv preprint that can be applied to all STPs at all time points, and we include a two-parameter logistic function that determines the probability of detection in the analyses. The parameters (load at which detection occurs with a probability of 0.5) and (steepness of the detection curve) are estimated. Hence, denoting the probability of detection by detection | where and denote the log Bernoulli and log normal probability densities, respectively. To tie the results at the STP level to other organisational levels (municipality, safety region, national) we apply posterior weighting of the estimated virus loads (i.e., the exponentiated estimated log-loads), where weighting is proportional to the numbers of persons contributing to the various units of organisation. Demographic data is obtained from Statistics Netherlands using the census of 1 January 2020 [15] , with the exception of the municipal mergers of January 2021 and 2022 which were manually added to the demographic data. Of note, the analyses reported here include all but the three overseas extraordinary municipalities. We analyse sewage data of more than 47,000 measurements spanning the period 1 August 2) as interface to Stan [16] . We run 10 MCMC chains in parallel and base the analyses on 1,000 samples from well-mixed chains, where we . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The Dutch sewage surveillance program was initiated in the first months of 2020. Since its inception, the number of included STPs has increased over time such that 80 STPs provided samples in August 2020, that all 313-317 STPs provided samples from September 2020 onwards, and that the frequency of sampling increased from 1 to 4 samples per week. An overview of the location and size of STPs in the Netherlands is provided in Figure 1 , and shows that STPs in the densely populated western parts of the Netherlands are generally significantly larger than those in other regions. Noteworthy, STPs often process sewage from multiple municipalities, and it is not uncommon that a municipality is being served by multiple STPs. Hence, STPs do not follow the hierarchical organisational structure of the Dutch administrative divisions, which makes averaging from STP-level data and results to municipalities, provinces, and safety regions not straightforward. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 30, 2022. ; https://doi.org/10.1101/2022.05.25.22275569 doi: medRxiv preprint Finally, Figure 5 shows the inferred log-transformed virus loads in the Netherlands, using the population-weighted STP estimates, together with the log-transformed daily total number of hospitalisations (https://data.rivm.nl/covid-19/). Again, overall patterns of estimated virus loads are similar to those observed in Figures 2 and 4 . Interestingly, epidemic troughs and peaks in the national estimate are less pronounced than those observed in most STPs. For instance, virus load estimates were as low as 10 12 in many STPs in August 2020 but almost 10 13 at the national level, and almost 10 15 in the February 2022 peak but only just over 10 14 at the national level. The observation that local epidemics are not fully synchronised is the reason that the national trend is less pronounced than those in the individual STPs (Figure 2 ). In fact, there is up to tenfold variation from one STP to the next in Over 99% of the Dutch population is connected to the sewerage and thus to a STP. This makes it possible, in principle, to perform high resolution and near-real time surveillance of pathogens that are shed in feces, urine, and other excreta and secreta [8, 17] . However, sewage is a complex matrix to analyze [18] , and proper controls are necessary to accurately quantify SARS-CoV-2 RNA. It is known that the amount of precipitation and the composition of industrial wastewater that is mixed in with the sewage of the general population will affect SARS-CoV-2 RNA concentrations determined by RT-PCR [19] . Still other factors that can affect measurements include the length of sewer lines and ambient temperature [20] . Here we have shown that even when there is substantial noise or day-to-day variation in RNA concentrations detected in sewage, it is possible to reliably estimate variations in underlying latent SARS-CoV-2 virus loads over time. This is possible because virus loads in sewage increase and decline exponentially during epidemic outbreaks, often by more than . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 30, 2022. ; two orders of magnitude. These very strong variations over time dwarf the noise in the signal, which is determined by the standard deviation of the log 10 -transformed virus load measurements ( ). In the analyses, the standard deviation is estimated at =0.353, such that individual measurements can be approximately fivefold lower or higher than the estimated latent virus load. Our analyses using multilevel splines provide a convenient research tool to analyse sewage data, as it enables a natural borrowing of information from the national virus load and samples taken around the same time. In this manner, we have been able to infer trends over time in the latent true virus loads for all 317 STPs in the Netherlands, as well as for a variety of other organisational levels (e.g., municipality, safety region, province, national). Specifically, the results provide estimates of the latent virus loads at all time points, and in particular for STPs on days that no measurements are available. In principle, our method of analysis can be extended further to any level of organisation that is built up from 4-digit postal codes in the Netherlands. In addition, subsampling from sewer lines in STPs with an unusually high virus load is also possible [21] . Especially in large STPs (such as Amsterdam-West and other STPs in Figure 2 ) there may be substantial differences between city districts with respect to SARS-CoV-2 prevalence, possibly associated with risk factors such as vaccination coverage and socio-economic status [22] . Although our analyses provide an adequate fit to the data, we do not claim that the results are in any way optimal for all SPTs at all time points. Rather, they provide a relatively parameter-sparse description of the data, linking the 317 STP time series through the national virus load while mildly favouring small STP-specific deviations from the national trend through the random effects. This seemed to work well in general, but it should be noted that the model has difficulty following very sudden changes in virus load data. A prominent example is presented by the sudden and strong drops in virus loads around October 2021 (Figure 2 ). This is only partially resolved by adding more knots to the splines, at the cost of greatly increasing computation times (not shown). Future extensions could also focus on adding STP-specific measurement noise, or perhaps even directly modelling . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 30, 2022. ; the PCR data and water flow rates jointly. Here we would like to mention that it is not easy, even in the idealised context of large sample size, to select the optimal model from competing models using information criteria in a time series setting [23] . Our statistical modelling has provided a description of trends in virus loads at a local level by properly weighing all available wastewater measurements. In contrast, other modelling studies have focused on relating national or subnational virus loads to case notification data or hospitalisations using mechanistic modelling, with the aim to estimate the indicidence and prevalence of the number of infections over time ( [24, 25] and references therein). The two approaches serve different purposes, and each has its strengths and limitations. A main strength of the mechanistic modelling is that all parameters have a biological interpretation, and that these analyses can be used for scenario studies. The results from such analyses, however, depend critically on model assumptions and are surrounded with large uncertainties. Our analyses do not yield a mechanistic interpretation but give precise estimates of latent virus loads that arguably are less dependent on specific model assumptions. In ongoing work, we aim to merge the two approaches by fitting transmission models at a local scale using generalized profiling [26] . Since the infrastructure of receiving sewage samples are in place, the detection of other viruses can be added to the Dutch sewage surveillance program. These might include rotavirus and enteroviruses but also influenza viruses [27, 28] , thus providing a comprehensive surveillance tool for pandemic preparedness. Moreover, sewage surveillance for antimicrobial resistance has already shown its potential [29] . In principle, our methods of analysis can directly be applied to other targets and can deal with noise and unbalanced data in a principled manner. The SARS-CoV-2 sewage surveillance program in the Netherlands has contributed to integrating available sewage data in a coherent framework, and also to informing the Dutch government on national and regional trends in SARS-COV-2 circulation [30] . Specifically, . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 30, 2022. ; data at the national level have been used by the Dutch Ministry of Health, Welfare and Sport, and data from local analyses by the Municipal Health Services. At present, surveillance for SARS-CoV-2 is not only used for early detection of upticks in virus loads but increasingly also for SARS-CoV-2 surveillance in general, in view of increasing bias in testing data and decreasing numbers of hospitalisations. By integrating parameters such as agestratified vaccination coverage and variant emergence, next aims are to explain age-specific hospitalisation admissions in the Netherlands from age-specific vaccination coverages, Full data for all 345 municipalities in the Netherlands are available at https://github.com/rivm-syso/SARS-CoV-2_sewage. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Estimating the asymptomatic proportion of SARS-CoV-2 infection in the general population: Analysis of nationwide serosurvey data in the Netherlands Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area Mortality and critical care unit admission associated with the SARS-CoV-2 lineage B.1.1.7 in England: an observational cohort study Asymptomatic infection and transmission of COVID-19 among clusters: systematic review and meta-analysis Associations Between Measures of Social Distancing and Severe Acute Respiratory Syndrome Coronavirus 2 Seropositivity: A Nationwide Population-based Study in the Netherlands Identification of multiple potential viral diseases in a large urban center using wastewater surveillance SARS-CoV-2 in wastewater: potential health risk, but also data source First Case of 2019 Novel Coronavirus in the United States A State-of-the-Art Scoping Review on SARS-CoV-2 in Sewage Focusing on the Potential of Wastewater Surveillance for the Monitoring of the COVID-19 Pandemic SARS-CoV-2 open data: National Institute for Public Health and the Environment 2022 Investigating Determinants of Multiple Sclerosis in Longitunal Studies: A Bayesian Approach Flexible smoothing with P-splines: a unified approach Bayesian data analysis Inhabitants by STP-municipality: Statistics Netherlands A Probabilistic Programming Language Future perspectives of wastewater-based epidemiology: Monitoring infectious disease spread and resistance to the community level Defining biological and biophysical properties of SARS-CoV-2 genetic material in wastewater The detection and stability of the SARS-CoV-2 RNA biomarkers in wastewater influent in Helsinki Factors influencing SARS-CoV-2 RNA concentrations in wastewater up to the sampling stage: A systematic review Spatially resolved qualified sewage spot sampling to track SARS-CoV-2 dynamics in Munich -One year of experience Differences in SARS-CoV-2 infections during the first and second wave of SARS-CoV-2 between six ethnic groups in Amsterdam, the Netherlands: A population-based longitudinal serological study. The Lancet Regional Health -Europe Approximate leave-future-out cross-validation for Bayesian time series models First detection of SARS-CoV-2 genetic material in the vicinity of COVID-19 isolation Centre in Bangladesh: Variation along the sewer network A wastewaterbased epidemic model for SARS-CoV-2 with application to three Canadian cities Parameter estimation for differential equations: a generalized smoothing approach Quantification and Trends of Rotavirus and Enterovirus in Untreated Sewage Using Reverse Transcription Droplet Digital PCR Wastewater-based detection of an influenza outbreak Nationwide surveillance reveals frequent detection of carbapenemase-producing Enterobacterales in Dutch municipal wastewater Dashboard: Ministry of Health, Welfare and Sport We are indebted to representatives of the Dutch Water Authorities, in particular Mark van der Werf, Imke Leenen, and Bert Palsma, for facilitating the current study. We would like to thank all STP personnel working in water laboratories responsible for the actual sampling of wastewater. We gratefully acknowledge comments on the manuscript by Susan van den Hof and Rianne van Gageldonk-Lafeber. This research was financed by the Dutch Ministry of