key: cord-0749153-hlegieoy
authors: Xiao, Amy; Wu, Fuqing; Bushman, Mary; Zhang, Jianbo; Imakaev, Maxim; Chai, Peter R; Duvallet, Claire; Endo, Noriko; Erickson, Timothy B; Armas, Federica; Arnold, Brian; Chen, Hongjie; Chandra, Franciscus; Ghaeli, Newsha; Gu, Xiaoqiong; Hanage, William P; Lee, Wei Lin; Matus, Mariana; McElroy, Kyle A; Moniz, Katya; Rhode, Steven F; Thompson, Janelle; Alm, Eric J
title: Metrics to relate COVID-19 wastewater data to clinical testing dynamics
date: 2022-01-14
journal: Water Res
DOI: 10.1016/j.watres.2022.118070
sha: 30c453238b5ba277247646512774c4e7e88db4d8
doc_id: 749153
cord_uid: hlegieoy

Wastewater surveillance has emerged as a useful tool in the public health response to the COVID-19 pandemic. While wastewater surveillance has been applied at various scales to monitor population-level COVID-19 dynamics, there is a need for quantitative metrics to interpret wastewater data in the context of public health trends. 24-hour composite wastewater samples were collected from March 2020 through May 2021 from a Massachusetts wastewater treatment plant and SARS-CoV-2 RNA concentrations were measured using RT-qPCR. The relationship between wastewater copy numbers of SARS-CoV-2 gene fragments and COVID-19 clinical cases and deaths varies over time. We demonstrate the utility of three new metrics to monitor changes in COVID-19 epidemiology: (1) the ratio between wastewater copy numbers of SARS-CoV-2 gene fragments and clinical cases (WC ratio), (2) the time lag between wastewater and clinical reporting, and (3) a transfer function between the wastewater and clinical case curves. The WC ratio increases after key events, providing insight into the balance between disease spread and public health response. Time lag and transfer function analysis showed that wastewater data preceded clinically reported cases in the first wave of the pandemic but did not serve as a leading indicator in the second wave, likely due to increased testing capacity, which allows for more timely case detection and reporting. These three metrics could help further integrate wastewater surveillance into the public health response to the COVID-19 pandemic and future pandemics.

The coronavirus disease 2019 pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), continues to affect all aspects of global life since its emergence in late 2019. Efforts to contain its spread have relied on public health measures like social distancing, stay-at-home orders, mandatory wearing of face coverings in public, and community-based SARS-CoV-2 testing (Nussbaumer-Streit et al., 2020) . In the face of this public health emergency, multiple methods have been designed to assess the impact of SARS-CoV-2 on critical health infrastructure, implement rapid and robust individual testing, and predict potential surges to inform hospital preparedness and health policy makers.

Clinical surveillance data is the gold standard for evaluating the state of the pandemic. However, clinical data can be limited by clinical testing capacity and availability, human behavior, and the presence of asymptomatic infections. In contrast, wastewater surveillance, also known as wastewater-based epidemiology (WBE), is a potentially powerful strategy for near real-time monitoring of viral burden in the population, as it captures viral shedding from infected individuals, irrespective of clinical presentation. Such surveillance has been shown to detect SARS-CoV-2 in wastewater before widespread clinical reporting (Bar-Or et al., 2020; Kocamemi et al., 2020; Medema et al., 2020; Randazzo et al., 2020; Wurtzer et al., 2020) . It has been previously demonstrated that viral copy numbers in wastewater were higher than expected from confirmed clinical cases (Peccia et al., 2020a; Randazzo et al., 2020; Wu et al., 2020b Wu et al., , 2020a and that they preceded clinical reporting of new cases and hospital admissions (Peccia et al., 2020a; Randazzo et al., 2020; Wu et al., 2020a) , suggesting that wastewater surveillance could be used as an early warning system. Wastewater surveillance has also been used in combination with clinical data to infer average population-level viral shedding dynamics (Schmitz et al., 2021; Wu et al., 2020a) .

Wastewater surveillance has been mostly implemented at municipal wastewater treatment plants, but as colleges and universities reopened, it has also been applied in dormitories as an early warning system to prevent large-scale outbreaks of COVID-19 (Betancourt et al., 2020; Gibas et al., 2021; Harris-Lovett et al., 2021; .

To interpret wastewater surveillance results in terms of population health, it is critical to understand the relationship between wastewater SARS-CoV-2 concentrations and infections in the population. Early work focused on evaluating the correlation between wastewater SARS-CoV-2 concentrations and clinically reported cases and characterizing the lead time of wastewater surveillance data. Various groups used correlation methods to estimate this lead time, with reports of wastewater leading clinical cases by 0-16 days (Medema et al., 2020; Nemudryi et al., 2020; Peccia et al., 2020b; Randazzo et al., 2020; Wu et al., 2020a; Wurtzer et al., 2020) . While the correlation between the wastewater and clinical curves is relatively easy to compute, it does not take into account autocorrelation in the datasets, so further work has employed more sophisticated statistical models to estimate the time lag (Galani et al., 2021; Peccia et al., 2020a) . Some groups have used wastewater data to infer the number of infected cases in the catchment (Ahmed et al., 2020; Chavarria-Miró et al., 2021; Galani et al., 2021; Gerrity et al., 2021; Krivoňáková et al., 2021) , while others have used wastewater data to model average population shedding rates (Cavany et al., 2021; Wu et al., 2022) .

These modeling studies shed important light on the relationship between wastewater viral concentrations and clinical cases. However, municipalities often rely on qualitative trends when using wastewater SARS-CoV-2 concentrations to inform decisions (Cambridge Public Schools, 2021; Centers for Disease Control, 2021; Ohio Coronavirus Wastewater Monitoring Network, 2021) , whereas wastewater data can be a richer source of public health information when integrated with other datasets. There remains a need for quantitative metrics to help municipalities interpret wastewater data in the context of clinical trends and public health interventions and evaluate their pandemic response. Such metrics would help municipalities more effectively incorporate wastewater data into the public health decision making toolkit, beyond observing the qualitative trends in wastewater SARS-CoV-2 copy numbers. Here, we developed three new metrics and applied them to a 14-month-long time series of SARS-CoV-2 wastewater SARS-CoV-2 copy numbers in Massachusetts. These three metrics -(1) the ratio between wastewater SARS-CoV-2 copy numbers and clinical cases (WC ratio),

(2) the time lag between wastewater and clinical reporting, and (3) a transfer function between the wastewater and clinical case curvesaptly describe dynamic changes in the relationship between wastewater data and clinical data as the pandemic evolves and management strategies adapt. Our results show that wastewater data preceded clinically reported cases in the first wave of the pandemic but did not serve as a leading indicator in the second wave, likely due to increased testing capacity, which allows for more timely case detection and reporting. Thus, these metrics can be useful tools for officials to evaluate the public health response over time. transported to the lab on the same day of sample collection with ice. If the samples were collected on weekends or holidays, they were stored at 4°C and transported to the lab the next business day with ice. Upon receipt, samples were brought to 60°C and pasteurized for 1 hour to inactivate the virus due to biosafety regulations at MIT. Several studies have found that heat treatment at 60 °C did not negatively impact quantification of SARS-CoV-2 (Y. Pastorino et al., 2020) or PMMoV (Shirasaki et al., 2020) .

Samples were analyzed using previously described methods Wu et al., 2020a) . Briefly, samples were filtered to remove large particulate matter using a 0.2uM vacuum-driven filter (EMD-Millipore SCGP00525 or Corning 430320, depending on sample turbidity). We used Amicon Ultra-15 centrifugal ultrafiltration units (Millipore UFC903096) to concentrate 15ml of wastewater approximately 100x. Viral particles in this concentrate were immediately lysed by adding AVL Buffer containing carrier RNA (Qiagen 19073) to the Amicon unit before transfer and >10-minute incubation in a 96well 2mL block. 100% ethanol was added to the lysate, and samples were applied to RNeasy Mini columns or RNeasy 96 cassettes (Qiagen 74106 or 74181). RNA samples were subjected to one-step RT-qPCR (ThermoFisher 4444436) analysis in triplicate for N1, N2, and PMMoV amplicons on CFX96 and/or CFX-Connect instruments based on the following protocol: 50°C 10 mins for reverse transcription, 95°C 20 s for RT inactivation and initial denaturation, and 48 cycles of denature (95°C 1 s) and anneal/extend (55°C 30 s). Cts were called from raw fluorescence data using the Cy0 algorithm from the qpcR package (v1.4-1) in R (Guescini et al., 2008) , and manually inspected for agreement with the raw traces in the native BioRad Maestro software.

Previous work with murine hepatitis virus (MHV) spike-ins have shown the recovery efficiency of our process to be 31.42 ± 2.59% ).

An extraction blank was processed in parallel to samples to detect contamination of extraction reagents. If the extraction blank had detection of SARS-CoV-2 by the N1 or N2 primers with Ct < 40, or PMMV with Ct < 35, the data were discarded, and the samples in the batch were re-extracted starting from filtered wastewater.

Each RT-qPCR plate has a specified number of controls. Two no-template controls were assayed to detect contamination of the RT-qPCR reagents. If either of the notemplate controls had detection of SARS-CoV-2 by the N1 or N2 primers with Ct < 40, or PMMV with Ct < 35, the data were discarded, and RNA samples were re-assayed with a fresh mix of RT-qPCR reagent. Four replicates of synthetic SARS-CoV-2 RNA were assayed as positive controls. If the average Ct values from these controls was not within range (+/-1 Ct from expected, lot-specific Ct-value), the data were discarded, and RNA samples were re-assayed with a fresh mix of RT-qPCR reagent. Matrix inhibition was assessed by manually reviewing the raw qPCR curves.

In addition to these laboratory controls, a thorough data review process was implemented for quality control. Firstly, we used PMMoV as a proxy measure for perextraction-batch recovery and flagged any plate with unusually low PMMoV values for further review and potential repeat processing. Furthermore, each sample would be manually reviewed if they met any of the following criteria: -PMMoV above 99th or below 1st percentile of all previously processed sample values; -SARS-CoV-2 concentrations changing more than 5-fold since prior sample at the same location;suspected inhibition based on manual inspection of raw fluorescence curves;discordant virus concentrations obtained from N1 and N2 primers; -pigmentation present in extracted RNA. During manual review, individual RT-qPCR replicates and timelines of SARS-CoV-2 and PMMoV concentrations were inspected. A small fraction of samples manually inspected in this manner were flagged for rerun. If sufficient filtered wastewater remained in the initially-processed tube, a second 15mL aliquot was processed. An aliquot from one of the back-up tubes was always processed. If the rerun results were significantly different from the initial results, the initial results were discarded. If the rerun results recapitulated the initial results, the average of all results was reported and used in this work.

A standard curve was generated using serial dilutions of Twist Bioscience synthetic SARS-CoV-2 RNA control 2 (MN908947.3) and used to convert Ct values into copies per well. We used pepper mild mottle virus (PMMoV) as a fecal indicator and quantified it relative to SARS-CoV-2 in each sample using the standard curve for N1, as synthetic RNA for PMMoV was not available and our PMMoV normalization method is dependent on ratios of PMMoV values rather than absolute values. Ct values above 40 were considered as non-detect. The copies per well were multiplied by a dilution factor accounting for the volume changes described above (RNA extraction, concentration, etc.) and then divided by the original sewage volume (15 ml) to convert to a sewage concentration (copies per liter). Concentrations of N1 and N2 replicates were averaged first within each primer set and then across primers to get the final SARS-CoV-2 concentration; replicates of the PMMoV amplicon were averaged. Samples were required to have at least two quantified replicates between N1 or N2, and at least one detected PMMoV replicate to be considered a detection. For median normalization using PMMoV, SARS-CoV-2 concentrations were divided by the PMMoV levels and multiplied by a reference PMMoV value derived as the median of our dataset. For example, samples that are more diluted will have lower PMMoV concentration and will be normalized up, while samples that are less diluted will have higher PMMoV concentration and will be normalized down. Normalization using PMMoV has previously been shown to account for variability in wastewater flow (Wu et al., 2020b) , and allows us to correct for dilution and laboratory variation simultaneously. Data on the concentration of SARS-CoV-2 viral RNA in Massachusetts wastewater are publicly available at https://www.mwra.com/biobot/biobotdata.htm. Health, 2021a, 2021b).

Approximate Bayesian computation (ABC) was used to find the delay distribution between when an infected individual's viral shedding appears in wastewater and when they are counted as a clinical case before and after August 15, 2020. ABC was chosen because it is extremely flexible and relatively easy to implement, allowing the inference to be carried out for complex models without need for evaluating the likelihood function, which could be computationally intractable (Csilléry et al., 2010; Fearnhead and Prangle, 2011; Toni et al., 2009) . Instead, ABC uses the computational efficiency of modern simulation techniques by comparing the observed and simulated data (Csilléry et al., 2010; Fearnhead and Prangle, 2011; Toni et al., 2009) . First, wastewater viral copy numbers were normalized by their sum for each portion of the data. The delay δ was assumed to be normally distributed with mean μ and standard deviation σ. Prior distributions for μ and σ were chosen as follows: μ ~ Norm(0,10) and σ ~ Exp(0.1).

10,000 values of μ and σ were sampled from the priors. For each iteration, a delay for each clinically reported case was sampled. Simulated wastewater data was generated by adding the delay for each case to the date they were actually reported. The number of delayed cases per day was normalized by the sum. The average per day sum of squared errors (SSE) between the simulated wastewater data and the actual wastewater data was calculated. The values of μ and σ were accepted if the average SSE was less than some cutoff value ε. ε was tuned so approximately 10-20% of the iterations were accepted.

A similar approach as we previously reported was used to find the transfer function that describes the relationship between the shape of the wastewater data and clinical data (Wu et al., 2020a) . This concept was borrowed from the field of signal processing, where a transfer function represents the mathematical relationship between the numerical input to a dynamic system and the resulting output (Pollock, 2011) . Because the shapes rather than the magnitudes of the curves are of greatest relevance, the wastewater data was divided by the median ratio between wastewater and clinical data.

Clinically reported cases C(t) were modeled as the convolution between the scaled wastewater data W(t) and the unknown transfer function T(t) before and after August 15, 2020: log10(C(t)) = log10([T * W](t)). It was hypothesized that the transfer function could be fit by a beta distribution with parameters α, β, and scaling factor c because beta distributions harbor a rich variety of shapes with only two parameters. The score function was defined as the sum of squared errors (SSE) between log10(clinically reported cases) and log10([T * W](t)). The L-BFGS method (Liu and Nocedal, 1989) in the scipy.optimize.minimize function was used to find parameters α, β, c of the beta distribution that minimized the SSE. L-BFGS was chosen because of its computational speed and memory requirements (Liu and Nocedal, 1989) . For initial parameter guesses, a combination of α = [2, 20, 50, 100, 200], β = [2, 20, 50, 100, 200] , and c = [0.01, 0.1] was used, which gives a wide variety of starting shapes for the transfer function.

Under the assumption of normally distributed errors, minimizing the SSE is equivalent to maximizing the likelihood (Berkson, 1956) . MCMC simulation was used to investigate the uncertainty landscape around the maximum likelihood estimation of the parameters for the inferred transfer function. MCMC methods have previously been used to quantify uncertainty in various fields (Bardsley and Fox, 2012; Hassan et al., 2009; Maybank et al., 2020) . The Metropolis-Hastings algorithm, a ubiquitous and versatile MCMC algorithm (Hastings, 1970) , was employed. Briefly, the algorithm started at the maximum likelihood estimate for each parameter α, β, and c. The transition function was defined as a normal distribution centered around the previous parameters, with standard deviation (1, 1, 0.001) for α, β, and c, respectively. At each iteration, a new set of parameters was selected using the transition function and the log likelihood was computed. The new parameters were accepted based on the Metropolis-Hastings algorithm acceptance rules. New parameters were always accepted if the log likelihood was higher. If the log likelihood was lower for the new parameters, they were accepted with probability exp(-delta(SSE)). 100 random accepted parameter sets were selected and plotted in Figure 4B and 4D to illustrate the uncertainty around the maximum likelihood estimate of the transfer function. MCMC simulation was done with python 3.6.5, numpy 1.14.3, pandas 0.23.0, and scipy 1.1.0.

Analysis of our 14-month-long time series for wastewater copy numbers of SARS-CoV-2 viral gene fragments spanning March 4, 2020 to May 13, 2021 showed two distinct waves of SARS-CoV-2 in the Boston Area. Similar trends appeared in wastewater viral copy numbers and clinical cases: exponential rise from March to mid-April, a decline through July, a slow increase over the summer, followed by a sharper increase in the fall and second peak in the winter (Figure 1 ), indicating that wastewater viral copy numbers generally mirrored trends in disease incidence. Previous reports have shown that SARS-CoV-2 concentrations in wastewater may be affected by flow rate and other physicochemical properties, such as total suspended solids, to varying degrees across different wastewater treatment plants (Amoah et al., 2022; Paul et al., 2021) . In both the northern and southern influents of Deer Island Treatment Plant, SARS-CoV-2 concentrations did not have high correlation with flow or physicochemical properties (Table S2, Table S3 ). The robustness of the SARS-CoV-2 measurements to variation in flow and total solids is likely due to the normalization with PMMoV which corrects for dilution of human fecal material.

Wastewater and clinical data were compared with dates of known policy changes and social gatherings in the Boston Area and key trends were noted. For example, wastewater viral copy numbers and clinical cases continued to decline overall after Memorial Day (May 25, 2020), despite the potential for large gatherings to celebrate the holiday ( Figure 1) . Similarly, the social justice protests during late May and June did not immediately spark an increase in clinical cases or wastewater viral copy numbers ( Figure 1 ). The start of Phase 2 Step 2 (Table S1) marked the start of a steady increase in wastewater viral copy numbers and clinical cases during the summer (Figure 1 ).

There was also a steeper increase in wastewater viral copy numbers and clinical cases after the Indigenous Peoples' Day holiday (October 12, 2020) . This increase continued through reopening Phase 3

Step 2 and peaked in late November to January around the time of Thanksgiving, Christmas, and New Year holidays, perhaps due to increased indoor gatherings (Figure 1 ). However, trends in wastewater data differed from clinical data after some key events, suggesting a decoupling of wastewater and clinical trends which we hypothesize provides insight into dynamics of COVID-19 in the community. There was a short peak in wastewater viral copy numbers at the start of August, which was only slightly reflected in the clinical data ( Figure 1) . Similarly, after colleges and universities welcomed students back in late August/early September, there was another peak in wastewater viral copy numbers, but not in clinical cases (Figure 1 ). After the start of Phase 3

Step 2 reopening, wastewater viral copy numbers increased steeply, while clinical cases had a shallower slope (Figure 1 ). These differences could be due to the inherent difference between wastewater and clinical data. For example, wastewater measurements of SARS-CoV-2 may vary based on viral shedding differences between asymptomatic, symptomatic, mild, or severe cases, whereas clinical cases are either reported or not. However, we hypothesized that the observation of distinct trends in wastewater copy numbers over clinical data may suggest that wastewater viral copy numbers could indicate the impact of social activities more sensitively than clinical cases. This observation prompted us to derive a metric for detecting discordance between wastewater and clinical trends.

Wastewater surveillance captures all individuals who are shedding the virus regardless of disease manifestation, their access to testing, or representation in clinical case data.

As such, wastewater trends are commonly used for benchmarking against clinical cases. However, there is a lack of a quantitative measure for comparing wastewater and clinical trends. Here, we propose using the ratio of wastewater viral copy numbers to clinical cases (WC ratio) as a metric for detecting differences between wastewater and clinical trends.

Changes in the WC ratio can serve as an indicator of potential under-or overestimation of disease incidence. Under-counting of clinical cases could occur when clinical tests are limited, when people are not seeking testing or cannot access convenient testing locations, or when the proportion of asymptomatic infections is high.

Over-estimation of disease incidence could occur when rapid expansion of testing infrastructure allows many people who were infected in the prior weeks to get tested, so their results show up as new cases even though they were actually infected weeks prior. This situation could occur especially because throat and nasal swab PCR tests for SARS-CoV-2 can remain positive for up to 20 days after symptom onset (Wölfel et al., 2020) .

Because viral copy numbers will vary based on the number of people in the catchment, the magnitude and range of the WC ratio must be established for each catchment considered. When this ratio is high, it implies that the existing testing capacity has not kept pace with exponentially rising new cases, which nevertheless are detected in wastewater surveillance. A high WC ratio can occur in situations where wastewater viral copy numbers are increasing but clinical cases are not rising equivalently, or conversely when wastewater viral copy numbers are stable but clinical testing is decreasing. Conversely, a low WC ratio indicates that clinical tests are capturing the majority of infections reflected in wastewater viral copy numbers. When this ratio is stable and low, it implies that the existing testing capacity is sufficient to assess the extent of new infections. Importantly, changes in the WC ratio relative to a stable baseline may provide early indications of changing epidemic dynamics. Positive changes may highlight new bursts of infections before they are captured in clinical data or identify periods where clinical tests are not capturing the full extent of new infections, while negative changes may highlight periods where clinical testing is over-estimating disease incidence when counting previously infected cases as new cases. Notably, this ratio has changed by approximately two orders of magnitude over the course of the pandemic, demonstrating its utility as a metric to assess the public health response.

At the beginning of the pandemic (March 2020), the ratio between wastewater viral copy numbers (viral genome copies (GC) per L) and reported clinical cases was very high (>10^3 for the Boston Area), indicating that cases were likely undercounted due to extremely limited testing (Figure 2A ). In fact, during March 2020, the seven-day average of new molecular tests administered per day in the state did not exceed 5,000 tests per day ( Figure 2B ). As testing ramped up throughout April and May, the WC ratio dipped by approximately two orders of magnitude ( Figure 2A ). In this phase, delayed clinical test results may have been "catching up" to the more instantaneous wastewater viral copy numbers, which infected individuals had contributed to many days prior to getting their test results. Thus, these individuals were no longer contributing to wastewater viral copy numbers (numerator of the WC ratio) but were now being counted as reported cases (denominator of the WC ratio), leading to a much lower ratio.

As the public health response in Massachusetts began to ramp up over the summer and individual clinical testing became more available, the ratio between wastewater and clinical cases remained fairly consistent between July 2020 and November 2020 (Figure 2A ). During this time, Massachusetts' clinical testing capacity became fully established, and percent positivity remained stable below 2% (Massachusetts Department of Public Health, 2021b). Such periods could be used to determine baseline WC ratios that indicate sufficient public health capacity for testing.

Interestingly, even during the pronounced peak of the second wave (November 2020 -March 2021), the WC ratio did not spike, indicating that testing capacity was sufficient to capture the scope of exponentially rising new infections.

However, there were a few increases in the ratio even with fully established testing capacity, notably in early September and early October (Figure 2A ). These increases in the WC ratio could be related to community events, such as reopening of businesses and universities, or to changes in testing availability. These increases could also be due to a combination of factors including shifts in population demographics. For example, college students returning for classes in late August/early September may have been more likely to be asymptomatic due to their younger age, and thus less likely to be reflected in clinical case counts (Leidman et al., 2021; Leidner et al., 2021) .

open (Baker, 2021a) , young adults may have had higher degrees of social contacts. In these instances, wastewater surveillance may have detected a silent and short-lived peak in community transmission that was missed by clinical surveillance. Importantly, these short increases became more apparent when analyzing the ratio between the two datasets.

While the WC ratio could identify whether trends in clinical cases are concordant with trends in wastewater copy numbers, it does not provide information on the timeliness of clinical reporting, which would be an important measure of the public health response.

To address this gap, we introduce two methods to characterize this time lag between wastewater and clinical trends: (1) a model of the distribution of time lags for each new case, and (2) a transfer function to describe the relationship between the wastewater and clinical curves.

In our first metric to assess the time-varying relationship between wastewater data and clinical cases ( Figure 3A ), approximate Bayesian computation (ABC) was used to model the distribution of the time lag between when an infected person's viral shedding is detectable in wastewater and when they receive a clinical test (Methods). In this model, it is assumed that the viral shedding detected in wastewater occurs on a single day. The assumption that wastewater reflects viral shedding early in infection is supported by animal models, meta-analysis of clinical data, and wastewater surveillance in dormitories (Bao et al., 2020; Hoffmann and Alsing, 2021; Schmitz et al., 2021) . The date of clinical cases in this analysis corresponds to the date of specimen collection.

Negative time lags indicate that wastewater signal precedes clinical testing and vice versa. To compare the relationship between wastewater data and clinical data during the first and second waves of the pandemic, time series was split on August 15, the approximate midpoint between the end of the first wave and the start of the second wave.

Before August 15, 2020, the wastewater signal preceded clinical cases by approximately 6.2 days (95% CI: -10.1, -2.7) with a standard deviation of 2.7 days (95% CI: 0.1, 6.7) ( Figure 3B-D) . This finding is consistent with our previous report that wastewater data preceded clinical data by 4-10 days (Wu et al., 2020a) . After August 15, 2020, the wastewater signal was more in phase with clinical cases (mean time lag 1.0 days, 95% CI: -2.4, 4.2; standard deviation 3.5 days, 95% CI: 0.2, 8.4) ( Figure 3E -G). We repeated the modeling while varying the date of the split and found similar results ( Figure S1-4) .

To confirm these modeling results, we next investigated the transfer function T(t) that transforms wastewater viral copy numbers to clinical cases. This concept was borrowed from the field of signal processing, where a transfer function represents the mathematical relationship between the numerical input to a dynamic system and the resulting output (Pollock, 2011) . While the previous ABC analysis modeled the delay for each individual case to find the distribution of delays, the transfer function analysis focuses on relating the shapes of the wastewater signal and clinical case signal. The shape of T(t) can provide insight on the time delay between the two signals and can reflect factors such as how long it takes someone to request testing and the probability that someone gets a positive test result over the course of their infection ( Figure 4A and C).

The shape of the transfer function changed between the first and second waves, reflecting changing relationships between infections and clinical testing. Using data before August 15, the inferred transfer function had a broad peak and long tail, with the peak at approximately 3 days and an average of 10 days, which is within the confidence interval of the ABC model ( Figure 4B ). The broad shape implies that the process of infected individuals getting counted as cases has a broad distribution, with some individuals getting reported very quickly but others taking up to weeks. In this situation, wastewater viral copy numbers could be an early indicator of disease dynamics before clinical test results come back positive. As the pandemic progressed, the inferred transfer function became more sharply peaked around a 1-day time lag, which is consistent with the results of the ABC model. This sharp distribution indicates that wastewater and reported cases track each other closely ( Figure 4D ). In this case, wastewater viral copy numbers have less utility as an early warning system because increased clinical testing capacity effectively captures new infections in a timely manner.

Taken together, these results suggest that the relationship and time lag between wastewater viral copy numbers and positive clinical tests changed over the course of the pandemic. Wastewater was more effective as a leading indicator in the first wave of the pandemic, and this early warning effect diminished drastically in the second wave, perhaps as clinical testing availability increased. Thus, parameters such as the time lag between wastewater signal and clinical signal and the transfer function describing their relationship could also be used to evaluate clinical test availability or capacity.

Given that wastewater SARS-CoV-2 copy numbers correlated with new clinical cases, we next investigated whether we could use wastewater copy numbers to predict COVID-19 deaths. The relationships between wastewater copy numbers, new cases, and deaths were different in the first and second waves, suggesting that wastewater may have a direct, mechanistic relationship with new cases, but an indirect relationship with deaths, indicating that wastewater data must be used in conjunction with other public health datasets for making policy decisions.

In the first surge (Mar-May 2020), both new cases and deaths peaked shortly after wastewater peaked ( Figure 5A ). In this situation, wastewater copy numbers could be a predictive indicator of COVID-19 deaths. However, in the second surge (Nov 2020-Mar 2021), wastewater viral copy numbers and new cases increased to levels higher than in the first surge, while deaths remained lower than the first wave ( Figure 5A ). In this situation, wastewater copy numbers were an indicator of new cases but not deaths.

This difference could be due to a combination of several factors. First, the first wave strongly affected people in the 80+ year bracket, while the second wave has seen an increase in positivity in the 0-29 year bracket (Wikle et al., 2020) . Older adults with comorbidities have higher mortality rates from COVID-19 (Centers for Disease Control and Prevention, n.d.). Second, medical professionals have gained experience in treating and managing the disease since the first wave, thus leading to reduced deaths in the second wave (Q. . Third, changes in human practices, such as improved hygiene and social distancing, could reduce the viral inoculum, resulting in less severe disease (Spinelli et al., 2021) . Fourth, the increase in testing means a higher proportion of cases are being diagnosed, possibly including those who are asymptomatic or only mildly ill, so the proportion of deaths to new cases would be lower. This explanation is consistent with the decrease in WC ratio (Figure 2 ) seen in the second wave. We explored the possibility that variants of concern with increased transmissibility like B.1.1.7 (Frampton et al., 2021; Graham et al., 2021) , could have influenced the ratio of deaths to new cases. However, the proportion of B.1.1.7 in the last week of January 2021, during the peak of the second wave, was estimated to make up only an average of ∼2.1% COVID-19 cases in the U.S (Washington et al., 2021) . Furthermore, B.1.1.7 did not make up a large proportion of wastewater viral copy numbers in the Boston Area until Feb-March 2021, towards the end of the second wave .

Notably, wastewater viral copy numbers reflect the number of new cases regardless of the demographics or symptoms of cases. With the changing demographics of the disease, changes in clinical practice, and increased testing capacity, the power of wastewater to predict new cases and COVID-related deaths also changes. Therefore, wastewater should not be used alone to predict public health outcomes, but rather should be used in combination with other data sources to understand the pandemic and inform decision making.

Many groups have demonstrated that wastewater SARS-CoV-2 copy numbers reflect trends in new COVID-19 cases, and wastewater viral copy numbers have been observed to precede trends from clinical surveillance (Bar-Or et al., 2020; Kocamemi et al., 2020; Medema et al., 2020; Randazzo et al., 2020; Wurtzer et al., 2020) . Here, we developed three new metrics to integrate wastewater and clinical data to quantitatively understand their relationship and the public health response. We introduced the ratio of wastewater viral copy numbers to clinical cases (WC ratio) as a simple metric which may reflect testing capacity. We also applied two independent types of modeling to quantify the lead time between wastewater data and clinical data. These models show that wastewater's lead time changes over the course of the pandemic as public health responses adapt. Finally, we showed that a third public health data stream, COVID-19 deaths, also has changing relationships with cases and wastewater over the pandemic, suggesting that wastewater data cannot be used alone and should be integrated with public health data to make policy decisions. Together, this work demonstrates the utility of combining wastewater surveillance with multiple public health data streams to provide a more nuanced view of the changing public health response to the COVID-19 pandemic.

The ratio between wastewater viral copy numbers and clinical cases (WC ratio) is a useful indicator of the public health response and testing coverage and could be used to gauge the intensity of public health interventions in the face of changing disease incidence. In addition, short-lived trends in increased transmission may be more easily detected when analyzing the WC ratio compared to analyzing wastewater and clinical data independently. For example, the WC ratio showed a small spike in early September and early October. These short-lived spikes could be due to community events, such as university reopening and economic reopening phases. These spikes could also be due to changing demographics of disease. As younger people are more likely to have mild or asymptomatic infections, they are less likely to be captured by clinical testing infrastructure, even when tests are readily available. Therefore, the WC ratio should be considered when interpreting wastewater surveillance data because it may help detect asymptomatic disease transmission among the population.

We also introduced two models that showed that the delay between wastewater data and clinical reporting shrank from an average of 6.2 days to no significant difference between the wastewater and clinical trend after August 15, 2020. In addition, the inferred transfer function between wastewater viral copy numbers and reported cases shifted from a broad to a sharp peak, suggesting quicker access to clinical testing. These results suggest that wastewater surveillance can be useful as an early warning indicator of disease incidence when clinical testing is limited and also as a method to understand ramp up and scale down of community based testing capacity in different phases of the pandemic . Importantly, modeling the time lag and transfer function between the wastewater viral copy numbers and clinical cases provides a more quantitative method to understand their relationship and assess the pandemic response.

There could be many factors contributing to the decreasing delay between trends in wastewater and trends in clinical cases. Changing criteria to qualify for clinical testing, individual behavior in requesting tests, availability of convenient testing locations, and lab turnaround time can all affect the time lag between when a patient is infected and when their positive result is reported. In the beginning of the first wave, clinical testing was largely limited to those who met a restrictive combination of symptoms and exposure, gradually expanding to those who had exposure history (Becker, 2020; Brown et al., 2020) . However, the initiation of the Massachusetts #StopTheSpread program on July 8, 2020 and its expansion on August 7, 2020 made walkup, ondemand testing available to the public, aiding in the identification of asymptomatic and pre-symptomatic cases (Murphy, 2020; Office of Governor Charlie Baker and Lt. Governor Karyn Polito et al., 2020) . Widespread testing by local colleges further expanded this segment of identified cases through fall and winter (Broad Institute, 2020) . Reported case counts thus depended heavily on public health resources and policy. Wastewater surveillance is not subject to these social and logistical limitations and can therefore serve as a more instantaneous and unbiased readout of new cases during the pandemic. We and others have shown that wastewater likely detects a short period of high viral shedding early in infection (Hoffmann and Alsing, 2021; Wu et al., 2020a) , whereas patients can test positive during PCR testing of respiratory samples for longer periods of time (Wölfel et al., 2020; Zheng et al., 2020) , suggesting that wastewater could be more specific to newly infected patients. However, wastewater surveillance does not necessarily provide a readout of hospitalizations or deaths because these numbers also depend on who is infected and their access to healthcare, which cannot be distinguished via wastewater monitoring .

Therefore, wastewater should be used in conjunction with additional clinical data streams when making public health decisions related to hospitalizations and mortality.

This study has several limitations. First, the interpretation of the WC ratio relies on the assumption that the viral shedding rate did not drastically change over the course of the pandemic. While some variants of SARS-CoV-2 have been reported to have higher shedding rates or longer shedding duration (Frampton et al., 2021; Kissler et al., 2021) , the B.1.1.7 variant did not make up a large proportion of wastewater viral copy numbers in the Boston Area until March 2021, well past the periods in the summer where we described the notable peaks in the WC ratio . There have been mixed reports of the difference in shedding rate between symptomatic and asymptomatic people Van Vinh Chau et al., 2020; Zhou et al., 2020) , but in any case, it is unlikely that the ratio of asymptomatic to symptomatic cases would swiftly change in a catchment of 2.25 million people. Second, there are some fluctuations in the WC ratio throughout summer 2020 and spring 2021 that we were not able to tie back to community events that potentially increased social contacts and disease transmission. These fluctuations could arise from the noise in both datasets and from community behaviors that were not considered. Private gatherings would be hard to monitor, but perhaps more detailed analysis of mobility data could enhance our interpretations of these fluctuations. Third, in practice, the WC ratio is particularly useful when considering deviations from a stable baseline value. However, assessing such deviations in real time could be difficult, especially in the beginning stages of a pandemic before a baseline is reached. Additionally, it could be difficult to distinguish a shifting baseline from true long-term trends. It is also difficult to use clinical data to verify the silent community spread detected by the WC ratio in the early summer, particularly if those affected were asymptomatic and did not seek hospitalization. In any case, increases in the WC ratio could prompt officials to increase their public health messaging to quell these silent peaks and to remain on alert for further trends.

Application of these metrics to other municipalities requires several considerations. Differences in sewer system residence time, wastewater sample matrix, and sample processing methods could change the relationship between measured wastewater SARS-CoV-2 genome copies and clinical new cases. For example, longer sewer residence times may stretch out the wastewater curve, as the positive signals in wastewater would take longer to reach the sampling point at the WWTP. Viral degradation in wastewater during transport would also affect the measured wastewater concentrations, particularly if there were hot spots of infection in the sewershed.

Quantifying SARS-CoV-2 genome copies from sludge rather than primary influent could also lead to a delayed and extended wastewater signal, as it takes some time for solids to settle into sludge and the persistence of SARS-CoV-2 in sludge is longer than in the aqueous fraction (Balboa et al., 2021) . Either of these situations may lead to a smaller time lag between wastewater and clinical data. The persistence of SARS-CoV-2 viral particles is affected by environmental factors in different water matrices, such as temperature, UV, exposure, organic matter, disinfectants, and adversarial microorganisms (Amoah et al., 2022; Paul et al., 2021) , which affects the baseline magnitude of the WC ratio. Finally, laboratory methods for sample processing, such as pasteurization and choice of quantification method, could affect the measured wastewater SARS-CoV-2 concentrations based on viral degradation or different limits of detection. Thus, while these metrics could still be informative for decision makers, it will be important for municipalities to establish baseline values for the metrics in their specific catchments before using them to make public health decisions.

There are many extensions to this work that will make wastewater surveillance data more integrated and useful for the public health response. For example, wastewater viral copy numbers and clinically reported cases are inherently linked to the number of true infections in the population. More mechanistic modeling is necessary to infer the underlying trend of true infections from the observable wastewater and clinical data (Fernandez-Cassi et al., 2021) . Similarly, wastewater viral copy numbers are linked to true infections by a transfer function that describes population-level shedding, while clinical cases are linked to true infections by a transfer function that describes population-level testing availability and turnaround time . Inferring these two transfer functions will allow us to understand these parameters separately.

Combining wastewater data with clinical and demographic data may also allow us to infer these transfer functions per demographic group, giving us finer resolution understanding of viral shedding parameters and access to testing. As the pandemic has progressed and public health measures have changed, the utility of wastewater surveillance has also changed. In this time-varying context, integrative models can make wastewater data more flexible, useful, and predictive.

We introduced three new metrics to assess the time-varying relationship between wastewater data and clinical data. Applying these metrics to a 14-month time series of wastewater surveillance data in Massachusetts, we conclude:

 The relationship between wastewater viral copy numbers and clinically reported cases changed over the course of the COVID-19 pandemic.

 Wastewater surveillance data served as a leading indicator in the first wave (~6 days) but not the second wave, likely due to substantially increased testing capacity.

 Evaluating the relationships between wastewater and various public health data streams using these new metrics can provide a real-time evaluation of public health responses.

 More integrative models can help increase the utility and application of wastewater surveillance for managing the ongoing COVID-19 pandemic as well as future pandemics.

MM and NG are cofounders of Biobot Analytics. EJA is advisor to Biobot Analytics. TBE and PRC are medical consults to Biobot Analytics. CD, NE, MI, and KAM are employees at Biobot Analytics, and all these authors hold shares in the company.

clinical cases. Seven-day averages of wastewater viral copy numbers (blue) and new clinical cases reported for the three counties in the catchment (orange) (41-43). We marked major holidays (top), major social events (middle), and state reopening phases (bottom) in the three panels, respectively (Baker, 2021a (Baker, , p. 2, 2021b (Baker, , 2020a (Baker, , 2020b (Baker, , 2020c (Baker, , 2020d . Black arrows indicate peaks in wastewater SARS-CoV-2 RNA copy numbers that were not reflected in clinical case counts. parameter sets in blue. Before 8/15, the transfer function has a broad peak and long tail.

After 8/15, the transfer function becomes more sharply peaked. 

First confirmed detection of SARS-CoV-2 in untreated wastewater in Australia: A proof of concept for the wastewater surveillance of COVID-19 in the community

Effect of selected wastewater characteristics on estimation of SARS-CoV-2 viral load in wastewater

Order advancing all communities to Phase III, Step 2 of the Commonwealth's reopening plan

Order advancing all communities to Phase IV, Step 1 of the Commonwealth's reopening plan and transitioning to a travel advisory policy

Order implementing a phased reopening of workplaces and imposing workplace safety measures to address COVID-19

Order authorizing the re-opening of Phase II enterprises

Order further advancing the re-opening of Phase II enterprises

Order authorizing the re-opening of Phase III enterprises

Order further advancing Phase III re-openings in municipalities with reduced incidence of COVID-19 infection

Order returning all municipalities to Phase III, Step 1 COVID-19 safety rules

Executive Order No. 591: Declaration of a State of Emergency to Respond to COVID-19

The fate of SARS-COV-2 in WWTPS points out the sludge line as a suitable spot for detection of COVID-19

The pathogenicity of SARS-CoV-2 in hACE2 transgenic mice

An MCMC method for uncertainty quantification in nonnegativity constrained inverse problems

Regressing SARS-CoV-2 sewage measurements onto COVID-19 burden in the population: a proof-of-concept for quantitative environmental surveillance

State: Massachusetts Has Tested More Than 200 People For Coronavirus

Estimation by Least Squares and by Maximum Likelihood

Wastewater-based Epidemiology for Averting COVID-19 Outbreaks on The University of Arizona Campus

Broad Institute provides COVID-19 screening for students, faculty, and staff at more than 100 colleges and universities

Department of Public Health, 2020. Testing of Persons with Suspect COVID-19

COVID-19 Data Dashboard [WWW Document

Inferring SARS-CoV-2 RNA shedding into wastewater relative to time of infection (preprint)

National Wastewater Surveillance System

Older adults at greater risk of requiring hospitalization or dying if diagnosed with COVID-19

Time Evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in Wastewater during the First Pandemic Wave of COVID-19 in the Metropolitan Area of

Approximate Bayesian Computation (ABC) in practice

Nationwide trends in COVID-19 cases and SARS-CoV-2 wastewater concentrations in the United States (preprint)

Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC

Wastewater monitoring outperforms case numbers as a tool to track COVID-19 incidence dynamics when test positivity rates are high

Genomic characteristics and clinical effect of the emergent SARS-CoV-2 B.1.1.7 lineage in London, UK: a whole-genome sequencing and hospital-based cohort study

SARS-CoV-2 wastewater surveillance data can predict hospitalizations and ICU admissions

Early-pandemic wastewater surveillance of SARS-CoV-2 in Southern Nevada: Methodology, occurrence, and incidence/prevalence considerations

Wastewater Surveillance on a University Campus

Changes in symptomatology, reinfection, and transmissibility associated with the SARS-CoV-2 variant B.1.1.7: an ecological study 6

A new real-time PCR method to overcome significant quantitative inaccuracy due to slight amplification inhibition

Viral RNA Load in Mildly Symptomatic and Asymptomatic Children with COVID-19

Wastewater surveillance for SARS-CoV-2 on college campuses: Initial efforts, lessons learned and research needs

Using Markov Chain Monte Carlo to quantify parameter uncertainty and its effect on predictions of a groundwater flow model

Monte Carlo Sampling Methods Using Markov Chains and Their Applications

Faecal shedding models for SARS-CoV-2 RNA amongst hospitalised patients and implications for wastewater-based epidemiology

Densely sampled viral trajectories suggest longer duration of acute infection

First Data-Set on SARS-CoV-2 Detection for Istanbul Wastewaters in Turkey

Mathematical modeling based on RT-qPCR analysis of SARS-CoV-2 in wastewater as a tool for epidemiology

Quantitative detection of SARS-CoV-2 B.1.1.7 variant in wastewater by allele-specific RT-qPCR

COVID-19 Trends Among Persons Aged 0-24 Years --United States

Opening of Large Institutions of Higher Education and County-Level COVID-19 Incidence --United States

On the limited memory BFGS method for large scale optimization

A Novel COVID-19 Early Warning Tool: Moore Swab Method for Wastewater Surveillance at an Institutional Level

The experiences of health-care providers during the COVID-19 crisis in China: a qualitative study

Effect of Heat inactivation on Real-Time Reverse Transcription PCR of the SARS-COV-2 Detection

Archive of COVID-19 Cases in Massachusetts-COVID-19 Raw Data

COVID-19 Interactive Data Dashboard [WWW Document

Massachusetts Department of Public Health, 2020a. Archive of COVID-19 Cases in Massachusetts-COVID-19 Raw Data

MCMC for Bayesian uncertainty quantification from time-series data

Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands

Stop The Spread" Initiative Will Increase Testing Capacity In Eight Mass

Temporal Detection and Phylogenetic Assessment of SARS-CoV-2 in Municipal Wastewater

Quarantine alone or in combination with other public health measures to control COVID-19: a rapid review

Governor's Press Office, Massachusetts Executive Office of Health and Human Services

COVID-19 Dashboard

Applications of wastewater-based epidemiology as a leading indicator for COVID-19

Heat Inactivation of Different Types of SARS-CoV-2 Samples: What Protocols for Biosafety, Molecular Detection and Serological Diagnostics?

A review of the impact of environmental factors on the fate and transport of coronaviruses in aqueous environments

Measurement of SARS-CoV-2 RNA in wastewater tracks community infection dynamics

SARS-CoV-2 RNA concentrations in primary municipal sewage sludge as a leading indicator of COVID-19 outbreak dynamics (preprint)

Presented at the Discussion Papers in Economics 11/15, Division of Economics

SARS-CoV-2 RNA in wastewater anticipated COVID-19 occurrence in a low prevalence area

Enumerating asymptomatic COVID-19 cases and estimating SARS-CoV-2 fecal shedding rates via wastewater-based epidemiology

Suitability of pepper mild mottle virus as a human enteric virus surrogate for assessing the efficacy of thermal or free-chlorine disinfection processes by using infectivity assays and enhanced viability PCR

Importance of non-pharmaceutical interventions in lowering the viral inoculum to reduce susceptibility to infection by SARS-CoV-2 and potentially disease severity

Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems

The Natural History and Transmission Potential of Asymptomatic Severe Acute Respiratory Syndrome Coronavirus 2 Infection

Genomic epidemiology identifies emergence and rapid transmission of SARS-CoV

B.1.1.7 in the United States

SARS-CoV-2 epidemic after social and economic reopening in three US states reveals shifts in age structure and clinical characteristics

Virological assessment of hospitalized patients with COVID-2019

SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases

SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases

Wastewater surveillance of SARS-CoV-2 across 40 U.S. states from February to

SARS-CoV-2 titers in wastewater are higher than expected from clinically confirmed cases

Evaluation of lockdown impact on SARS-CoV-2 dynamics through viral genome quantification in Paris wastewaters

Viral load dynamics and disease severity in patients infected with SARS-CoV-2 in Zhejiang province

Viral dynamics in asymptomatic patients with COVID-19

We thank the Deer Island wastewater treatment facility team for providing the samples