key: cord-0899118-cz9vd1u3 authors: Tso, C. F.; Garikipati, A.; Green-Saxena, A.; Mao, Q.; Das, R. title: An exploratory study on the correlation of population SARS-CoV-2 cycle threshold values to local disease dynamics date: 2021-02-19 journal: nan DOI: 10.1101/2021.02.16.21251844 sha: 3654e0b054de94961872644a5845b6daa81df201 doc_id: 899118 cord_uid: cz9vd1u3 Introduction: Despite limitations on the use of cycle threshold (CT) values for individual patient care, population distributions of CT values may be useful indicators of local outbreaks. Methods: Specimens from the greater El Paso area were processed in the Dascena COVID-19 Laboratory. Daily median CT value, daily transmission rate R(t), daily count of COVID-19 hospitalizations, daily change in percent positivity, and rolling averages of these features were plotted over time. Two-way scatterplots and linear regression evaluated possible associations between daily median CT and outbreak measures. Cross-correlation plots determined whether a time delay existed between changes in the daily median CT value and measure of community disease dynamics. Results: Daily median CT was negatively correlated with the daily R(t), the daily COVID-19 hospitalization count (with a time delay), and the daily change in percent positivity among testing samples. Despite visual trends suggesting time delays in the plots for median CT and outbreak measures, a statistically significant delay was only detected between changes in median CT and COVID-19 hospitalization count. Conclusions: This study adds to the literature by analyzing samples collected from an entire geographical area, and contextualizing the results with other research investigating population CT values. . Preliminary analyses of simulation and surveillance testing data suggest that decreases in the distribution of CT values in a population, as measured by the median CT value, may precede a local outbreak, such that the median CT value may be a useful tool in predicting a surge (23, 24) . The present study describes an exploratory analysis of potential correlations between median CT values and COVID-19 disease dynamics, operationalized as percent positivity, transmission rate and COVID-19 hospitalizations. The samples in this study were collected between September 15th, 2020 and January 11th, 2021 as part of the ongoing diagnostic evaluation services provided by Dascena, Inc to residents in the state of Texas. In the greater El Paso area, a contractor for the El Paso Department of Public Health sends over 90% of collected samples to the Dascena COVID-19 Laboratory in Houston, Texas. The Pearl Independent Institutional Review Board (IRB) approved this study (IRB Protocol 21-DASC-127). This study included nasopharyngeal swabs, salivary samples, an anterior nares swabs sample, and samples for which the type of biological specimen was not specified. The overwhelming majority of samples were nasopharyngeal swabs. All biological samples were sent to the Clinical Laboratory Improvement Amendments (CLIA)-certified Dascena Laboratory. All samples were analyzed with TaqPath COVID-19 Combi Kit (Thermo Fisher Scientific, Waltham, Massachusetts), with extraction performed with a MagMAX RNA Isolation Kit (Thermo Fisher Scientific, Waltham, Massachusetts). Three gene targets are used by these assays, and may be the source of a positive result: the nucleocapsid (N) gene, the spike (S) gene and the open reading frames (ORF1ab) gene (25) . RT-PCR was only run once on any unique sample. For each RT-PCR test, the CT value was recorded. Only samples that produced a valid CT value for a positive COVID-19 test (i.e., at least 2 genes generating a positive signal with a CT value ≤ 37) were used to determine the daily median CT value and in subsequent correlation analyses. The following demographic data were available for testing samples: age, sex, race, ethnicity, and zip code of residence. Testing samples from the greater El Paso area were selected based on the zip codes listed as part of the El Paso metropolitan statistical area (MSA) by the US Department of Labor, Office of Workers Compensation Program (26) . Daily percent positivity rate was calculated among all samples tested by Dascena from the greater El Paso area. The effective reproduction number or transmission rate R(t), was derived using the opensource algorithm from COVID-19 tracking website rt.live (27). The algorithm is a Python script based on a Bayesian Estimation Model developed by Bettencourt & Ribeiro (28) , with slight modification to introduce gaussian noise to the prediction. Daily new COVID-19 case data from individual counties were obtained from the COVID-19 Dashboard by the Center for Systems . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02. 16.21251844 doi: medRxiv preprint Science and Engineering at Johns Hopkins University (1), grouped by MSA, and fed into rt.live's algorithm to generate a time series for R(t). The daily number of individuals hospitalized with COVID-19 in the El Paso area was derived from publicly available data produced by the Texas Department of State Health Services that are grouped by trauma service area (29) . In order to contextualize the results, a focused literature search was performed for peerreviewed publications and pre-print manuscripts on the use of CT values measurements across a population as a means of predicting or monitoring COVID-19 outbreaks. Three pre-prints were identified (23, 24, 30) . The datasets from the present study and the pre-prints were then compared in terms of: source population; type of testing; sample size; biological sample types include; duration of study period; gene target(s) of RT-PCR tests; CT-based value(s) measured; metrics used to measure COVID-19 outbreaks; and the outcomes of study. All analyses were conducted in Python (31) using the following packages: pandas, matplotlib, plotly, scipy and statsmodels. The daily median CT value among Dascena test samples, the daily R(t) in the El Paso MSA, and the daily count of hospitalized individuals with COVID-19 in El Paso were plotted over time. Rolling 7-day averages of the daily median CT value (with a minimum 5 days of data present in the window), the daily R(t), the daily number of COVID-19 hospitalizations, and the daily percent positivity among samples from El Paso sent to the Dascena Laboratory were also plotted over time. To better capture the dynamic change in percent positivity among Dascena test samples, the daily change in percent positivity was calculated from the 7-day rolling average for days with more than 200 total tests performed by the Dascena Laboratory. If fewer than 200 tests were performed on a particular day (e.g., due to holiday shut down of collection sites), the percent positivity from the previous day was carried forward. The daily change in percent positivity was then plotted over time. Scatterplots and linear regression were used to evaluate possible associations between the daily median CT value (N gene) and daily R(t), between the daily median CT value (N gene) and the daily count of COVID-19 hospitalizations, and between the daily median CT value (N gene) and the daily change in percent positivity among samples processed by Dascena. Since a significant time delay was observed between changes in the daily median CT value (N gene) and the daily count of COVID-19 hospitalizations, a time lag of 33 days was applied to the hospitalization data prior to creating the scatterplot and conducting linear regression. The median CT value based on the N gene was selected because it has previously been cited in research on population CT values (24, 30) . In order to evaluate whether a time delay existed between changes in the daily median CT value (N gene) and community outbreaks, cross correlation plots were constructed between daily median CT value and daily R(t), between daily median CT value and the daily count of patients hospitalized with COVID-19, and between daily median CT value and the daily change . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02. 16.21251844 doi: medRxiv preprint in percent positivity. In brief, a cross-correlation coefficient was obtained by dividing the correlation between two signals to the product of auto-correlation of each of the two signals. The argmax of the cross-correlation coefficient is the dominant lag time between the two signals. As the purpose of the present analysis was to investigate how the trough of daily median CT correlated with the peak of the other signals, to aid visualization the following modifications were made: (1) for each signal, the z-score was used instead of the absolute value; (2) the negative value of the z-score of daily median CT value was used to ensure a positive peak in the cross correlation plots; (3) 20% of positive samples were randomly sampled five times each day to estimate the variation in the cross-correlation between daily median CT value and epidemiological signals. A 1-sample t-test was used to determine if the mean lag differed statistically significantly from the zero. Pairwise comparisons were performed with Pearson's correlation (significance P < 0.05) to determine if any demographic factors associated with testing samples were significantly associated with R(t), COVID-19 hospitalization count, or percent positivity. The following demographic factors were investigated: daily number of tests; daily median age; daily percent samples from men; daily percent samples from individuals indicating White race; daily percent sample from individuals indicating Hispanic ethnicity. In the greater El Paso area, 148,410 COVID-19 tests were sent to the Dascena Laboratory for processing, and 36,306 tests were positive. 147,720 (99.54%) of samples were nasopharyngeal swabs, 28 (0.02%) were salivary samples, 1 sample (0%) was an anterior nares swab, and 661 samples were biological specimens (0.45%) for which the type of specimen was not recorded. The median CT value (N gene) for nasopharyngeal samples was 23.14, which differed significantly from the median CT value (N gene) observed for all other sample types of 25.58 (p < 0.05, Mood's median test). The demographic characteristics of the entire population tested for COVID-19 are presented in Table 1 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Variability over time was observed in the median CT values and measures of COVID-19 disease dynamics in El Paso ( Figure 1 ). As predicted based on the a priori hypothesis, the daily median CT was negatively correlated with the daily R(t), daily count of COVID-19 hospitalization (with a time delay), and the daily change in percent positivity among testing samples in the greater El Paso area (Figure 2 ). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 3) . While visual inspection of the daily median CT, daily R(t) and percent positivity plots over time (Figure 1 ) suggested that peaks in R(t) and percent positivity followed a trough in median CT, no statistically significant time delays were detected between median CT and change in percent positivity or R(t). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Abs. Lag (smoothed) is the absolute time difference between the peak of each epidemiological signal and the trough of daily median CT with a 7-day rolling average (red dots of figure 1b). Mean XCor Lag and SD XCor Lag represent the mean and standard deviation, respectively, among lags determined by the 5-fold sampling of daily median CT and cross-correlation. P-val shows the p-value of whether the cross-correlation between daily median CT and each of the epidemiological signals is statistically different from zero by 1-sample t-test. Pairwise comparisons revealed that some demographic factors of testing samples were associated with COVID-19 outbreak measures ( Table 2) . No other factors, including the number of tests performed, median age, or percentage of tests from male, Hispanic or White individuals each day, were significantly correlated with daily R(t), daily difference in positivity rate or daily count of COVID-19 hospitalizations in the El Paso area. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The dataset in this study was substantially larger than those used in comparator studies, but differed in that it was not a surveillance sample. Instead, this study used samples from individuals who required testing due to the presence of COVID-19 symptoms, or who required testing in the absence of symptoms (e.g. for work or travel clearance). Median CT value was the most common measure of population distribution of CT values across research studies to date, and R(t) and percent positivity were the most common outbreak measures ( Table 3) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted At present, surges are largely predicted based on observed local case and mortality rates, which may lag by several weeks behind changes in transmission rates or be obscured by changes in testing capacity (23) . Given the ubiquitous availability of CT data and the pressing nature of the pandemic, interest has risen in exploring the possibility that the population distributions of CT values can be used as indicators for local outbreaks. The current study adds to the growing literature on this topic by providing an analysis of median CT values from samples collected from an entire geographical area, and contextualizing the results with a comparison to other research investigating the application of population CT values. In the greater El Paso area, daily median CT values were found to be negatively correlated with the daily percent positivity among samples, the daily R(t) extracted from community case rates and the daily count of COVID-19 hospitalizations (with a delay). Of note, these associations were not observed in supplementary analyses (Supplementary Table 1 ), were processed. There appeared to be greater day-to-day variability in the median CT values over time rather than consistent trends in the MSAs evaluated in supplementary analyses, potentially reflecting differences in the strength of the signal that could be detected. In addition, substantial differences in the study populations may have contributed to variable significance of the relationship between median CT value and outbreak measures between study sites. This hypothesis is supported by the observation of significant demographic differences between El Paso MSA and the Texas MSAs evaluated in the supplementary analyses (Supplementary Table 2 ). This observation indicates that certain qualities of datasets used to measure population CT values may be important to their utility to approximate local COVID-19 surges. Changes in the population distribution of CT values significantly preceded a rise in COVID-19 hospitalizations in El Paso. However, contrary to the a priori hypothesis that changes in CT values would precede surges, the cross-correlation plots of median CT value, percent positivity and R(t) did not strongly demonstrate such a relationship. It therefore remains unclear from the data whether changes in the population distribution of CT preceded changes in community transmission, or vice versa. Other studies evaluating population CT values in surveillance samples have reported that changes in CT values may precede traditional signs of an outbreak (23, 24) . The inclusion of symptomatically indicated tests in the sample population may have influenced this association, such that a decline in CT values may be more closely linked to current case rates. Strengths of this study include that all RT-PCR analyses were conducted at a single laboratory using standardized testing protocols, and that large samples of positive COVID-19 tests were acquired for the study site. The vast majority (>99%) of samples were nasopharyngeal swabs, such that differences in median CT values based on sample type likely did not impact . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.16.21251844 doi: medRxiv preprint results. This study was not limited to a single medical center, but included samples collected from an entire geographical area. This research compared median CT values to R(t) and hospitalization count, traditional public health used benchmarks to define surges, providing greater validity than would be possible with only an internal comparison of different metrics of testing sample data. In addition, this study provided a novel examination of the features of RT-PCR testing data which may contribute to and affect the usability of population-level metrics of CT values for predicting disease dynamics in the community. While the study sample was large, other variables and forms of bias (e.g., sampling bias), may have influenced the results. Indeed, differences in the comprehensiveness of the El Paso dataset versus the supplementary site datasets-or in, other words, the relative proportion of tests conducted by the Dascena laboratory versus other testing providers-may have contributed to skew in the supplementary samples. Future directions for research on population CT values may therefore include analyzing whether significant differences in results can be detected in different sub-samples of tested populations, and evaluating methods to collate CT data across testing providers in a given geographic area. No data on symptomatology was associated with samples at the time of collection, such that these data do not enable a distinction between samples collected as part of clinical evaluation of symptoms consistent with COVID-19, or for other reasons (e.g., clearance for work or travel). Prior research assessing population distribution of CT values in relation to community outbreaks has explicitly used surveillance samples (23, 24) . The variability in the observed correlations between median CT and outbreak measures in El Paso versus other testing locations may reflect, in part, variability in the proportion of symptomatically indicated versus nonsymptomatically indicated tests in a given location. However, other differences between the testing site populations may also have contributed to the observed variability in the relationship between median CT value and outbreak measures, such as differences in the demographics of the tested population. The research question of whether median CT values derived from all testing data, versus only surveillance testing data, may be reliably used to predict disease outbreaks remains unresolved, and can only be addressed using datasets in which symptomatology at the time of testing or reason for testing may be linked to test results. The samples used in this study were not collected expressly for the purposes of public health surveillance or research, and so the demographic composition of the sampled population varied day to day. As indicated by Table 2 , some aspects of the daily demographic composition of the tested population were found to correlate with epidemiological outcomes. Daily variability in the sampled population may therefore translate to variability in the strength of the associations between median CT and measures of disease dynamics. However, these associations may also reflect underlying epidemiological trends, such as the disproportionately high rates of COVID-19 infection among Hispanic individuals (32) , including during outbreaks. Additional research with . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.16.21251844 doi: medRxiv preprint real-world samples may build on this work by further exploring the relevance of demographic factors to the accuracy and utility of population CT measures. As national, state and local authorities continue to refine public health programs to track and contain the spread of SARS-CoV-2, it is imperative to optimize methods for predicting surges in community transmission. Greater lookahead time would enable local and state officials to enact public health policies to mitigate an anticipated surge, and would provide health systems with the opportunity to initiate changes to their standard operating procedures, including activating reserve clinical personnel, procuring additional resources to the extent possible, and converting facilities to support additional patient flow. The population distribution of CT values, as measured by the median CT value, is a potential indicator for local outbreaks, which merits further investigation. CFT processed the data, adapted the software code and conducted statistical analyses, generated figures, contributed to drafting the manuscript, and participated in critically reviewing and editing the manuscript. AG obtained and organized the data for study, reviewed software and statistical analyses, and contributed to the primary drafting and editing of the manuscript. AGS contributed to critical review of study design and analyses, drafting the manuscript and editing the manuscript. QM and RD formulated the idea for this study, supervised analyses and critically reviewed and edited the manuscript. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) 84%) ** 17,921 (52.28%) ** 9,389 (56.12%) ** 203 (35.17%) 30,599 (69.07%) ** 9,604(28.02) ** 3,742(22.37%) ** Ethnicity Hispanic COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) US Food and Drug Administration. A Closer Look at COVID-19 Diagnostic Testing COVID-19 Real-Time Learning Network, by the Centers for Disease Control and Prevention and the Infectious Disease Society of America Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance To Interpret the SARS-CoV-2 Test, Consider the Cycle Threshold Value Accelerated Emergency Use Authorization (EUA) Summary Modified Thermo Fisher TaqPath COVID-19 SARS-CoV-2 Test Viral cultures for COVID-19 infectious potential assessment -a systematic review COVID-19 Outbreak and Presymptomatic Transmission in Pilgrim Travelers Who Returned to Korea from Israel Epidemiological Correlates of PCR Cycle Threshold Values in the Detection of SARS-CoV-2. Clin Infect Dis Off Publ Infect Dis Soc Am Viral kinetics of SARS-CoV-2 over the preclinical, clinical, and postclinical period Duration of infectiousness and correlation with RT-PCR cycle threshold values in cases of COVID-19 Predicting Infectious Severe Acute Respiratory Syndrome Coronavirus 2 From Diagnostic Samples A Systematic Review of the Clinical Utility of Cycle Threshold Values in the Context of COVID-19 Viral RNA load as determined by cell culture as a management tool for discharge of SARS-CoV-2 patients from infectious disease wards Initial Viral Load of a COVID-19-Infected Case Indicated by its Cycle Threshold Value of Polymerase Chain Reaction Could be used as a Predictor of its Transmissibility -An Experience from Gujarat, India Correlation Between 3790 Quantitative Polymerase Chain Reaction-Positives Samples and Positive Cell Cultures, Including 1941 Severe Acute Respiratory Syndrome Coronavirus 2 Isolates Viral and Antibody Kinetics of COVID-19 Patients with Different Disease Severities in Acute and Convalescent Phases: A 6-Month Follow-Up Study SARS CoV-2 Surveillance and Exposure in the Perioperative Setting with Universal testing and Personal Protective Equipment (PPE) Policies Viral dynamics in mild and severe cases of COVID-19 SARS-CoV-2 PCR cycle threshold at hospital admission associated with patient mortality Misinterpretation of viral load in COVID-19. medRxiv Challenges and Controversies to Testing for COVID-19 Estimating epidemiologic dynamics from single cross-sectional viral load distributions. medRxiv Viral load in community SARS-CoV-2 cases varies widely and temporally. medRxiv Diagnostic techniques for COVID-19 and new developments Compensation Programs -Medical Fee Schedule Real time bayesian estimation of the epidemic potential of emerging infectious diseases Texas COVID-19 Data Texas Department of State Health Services Texas COVID-19 Data Declining Trend in the Initial SARS-CoV-2 Viral Load During the Pandemic: Preliminary Observations from Detroit The Python Language Reference -Python 3.9.1 documentation [Internet]. The Python Software Foundation Racial and Ethnic Disparities in COVID-19-Related Infections, Hospitalizations, and Deaths