key: cord-0802276-n2e061xq authors: Owokotomo, O. E.; Manda, S.; Kasim, A.; Claesen, J.; Shkedy, Z.; Reddy, T. title: Modelling the positive testing rate of COVID-19 in South Africa Using A Semi-Parametric Smoother for Binomial Data date: 2020-11-15 journal: nan DOI: 10.1101/2020.11.11.20230250 sha: 0146947ecf64cff5579d9a5d4a523bcdc932c21a doc_id: 802276 cord_uid: n2e061xq The current outbreak of COVID-19 is a major pandemic that has shaken up the entire world in a short time. South Africa has the highest number of COVID-19 cases in Africa and understanding the country's disease trajectory is important for government policy makers to plan the optimal COVID-19 intervention strategy. The number of cases is highly correlated with the number of COVID-19 tests undertaking. Thus, current methods of understanding the COVID-19 transmission process in the country based only on the number of cases can be misleading. In light of this, we propose to estimate both the probability of positive cases per tests conducted (the positive testing rate) and the rate in which the positive testing rate changes over time (its derivative) using a flexible semi-parametric model. We applied the method to the observed positive testing rate in South Africa with data obtained from March 5th to September 2nd 2020. We found that the positive testing rate was declining from early March when the disease was first observed until early May where it kept on increasing. In the month of July 2020, the infection reached its peak then its started to decrease again indicating that the intervention strategy is effective. From mid August, 2020, the rate of change of the positive testing rate indicates that decline in the positive testing rate is slowing down, suggesting that a less effective intervention is currently implemented term prediction for the number of COVID-19 cases become a central tool for policy makers to de-48 sign innervation strategies in order to control the disease's spread. Recently, (Reddy et al., 2020) 49 proposed a robust model based approach, that does not require to make assumptions about the 50 transmission process to model the number of COVID-19 cases and to provide a short term predic-51 tion for 5-10 days ahead. These non-linear epidemiological models have previously been applied ysis by Shen (2020), a similar approach was taken to estimate the key epidemic parameters for all 59 11 provinces in China as well as 9 selected countries. 60 All models discussed above made use of the daily or cumulative number of cases to estimate the 61 models and the parameters of interest.In the context of COVID-19, this introduces a difficulty as 62 seen in Figure 1 , since in South Africa (and many other countries) the number of tests and number 63 of cases are correlated (Reddy et al., 2020) . preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 15, 2020. ; https://doi.org/10.1101/2020.11.11.20230250 doi: medRxiv preprint To overcome the problem that the number of cases depends on the number of tests, we propose an 76 alternative modeling approach that focuses on COVID-19 positive testing rate, i.e., the probabil-77 ity of positive cases per tests conducted. In this paper we model the daily number of COVID-19 78 cases among the number of tests carried out using a semi-parametric model in which the rate of 79 change of the positive testing rate is estimated using a smooth function of time. In particular, we 80 apply scatterplot smoothing techniques for binomial data using generalized additive models in or-81 der to obtain an estimate of the rate of change Ruppert et al. (2003) . In Section 2 we describe the 82 testing policy in South Africa from which the data used for the analysis presented in this paper September 2nd 2020 is presented in Figure 2 . The growth of COVID-19 infections in South Africa 90 appears to be tri-phasic especially during the early phase when the cumulative cases were low with 91 rapid growth until March 27th 2020. A total of 243 daily new cases were observed, followed by a 92 sharp decline in the rate of new cases. From March 28th 2020 to April 6th 2020 the daily increase 93 in cases was consistently below 100. From May 2020 onwards, a consistent increase of more than 94 1000 cases per day were observed. The peak period was between of July 9th and 19th where more 95 than 10,000 reported cases were reported on a daily basis. As of July, a total of 3726721 tests had 96 been conducted, corresponding to a testing rate of 22.816 per 1000 population. Throughout this 97 period, the proportion of infections increased until mid July when it started to decrease. 98 4 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 15, 2020. ; https://doi.org/10.1101/2020.11.11.20230250 doi: medRxiv preprint perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The number of positive cases is assumed to be binomially distributed. Let π t be the daily positive 114 testing rate per test, Y t be the daily number of cases and n t be the daily number of tests. Our aim is 115 to model the probability π t and to produce a model-based estimate for its first derivative, i.e., the where s k (x) is a set of spline basis functions. To avoid overfitting, the spline model is typically estimated by considering penalized max- with π = [π 1 , π 2 , . . . , π T ] T , β = [β 0 , β 1 , . . . , β d ] T , and u = [u 1 , u 2 , . . . , u K ] T . Note that β and u are 128 vectors of the fixed and random effects, respectively, with u k ∼ N (0, σ 2 u ) where σ 2 u acts as the The estimation of the model (3) is performed by means of penalized quasi-likelihood (PQL). Initial 134 estimates for β and u are used to calculate the pseudo-data y * : 135 y * = X X Xβ β β + Z Z Zu u u +W W W −1 (y − π π π) ) ) ≡ X X Xβ β β + Z Z Zu u u + ε ε ε * , where W W W is a diagonal matrix with variances of y i on the diagonal. The pseudo-error ε ε ε * Once the positive testing rate, π t , is estimated according to Equation (1) we can estimate the rate 143 of change in the positive testing rate over time using the derivative of π t given by assume that the number of tests will be constant nor that the tests will be applied to random sample 148 from the population. Also in this case, the derivative can provides a good indication about the 149 general trend of the virus' transmission for the tested population and can be used as a tool to asses 150 the success of an implemented intervention strategy. with perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 15, 2020. ; https://doi.org/10.1101/2020.11.11.20230250 doi: medRxiv preprint about the positive testing rate of COVID-19 is important to ensure prediction of the disease tra-181 jectory, optimal resource allocation and better understanding of the transmission process. In the 182 current study we modelled the COVID 19 cases out of the number of tests as a function of time 183 using semi-parametric approach. This approach allows us to adjust for or take into account the 184 number of tests performed, which when ignored may lead to erroneous conclusions. Also, this 185 method allows us to overcome the problem to modelling the number of cases alone and to take average is another measure that can be used to understand the rate of infection, but unlike the pos-195 itive testing rate, the moving average commonly uses partial information since there is always loss 196 of information on both tails. In our case, the same result was obtained using both measures. The rate of infection can be used as an indicator for the evolution of the outbreak over time 198 and to reveal new trends in the outbreak. One could also extend our approach by modeling jointly perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted November 15, 2020. ; https://doi.org/10.1101/2020.11.11.20230250 doi: medRxiv preprint A novel sub-epidemic modeling framework for short-term 204 forecasting epidemic waves Simultaneous 206 mapping of multiple gene loci with pooled segregants Turning points, reproduction number, and impact of climatological 208 events for multi-wave dengue outbreaks The positive rate: A crucial metric for understand-211 ing the pandemic R: A Language and Environment for Statistical Computing. R Foundation for 215 Statistical Computing South africa's 217 trajectory to 100000 cases and what lies ahead: data-driven, real-time prediction of total number 218 of reported cases and deaths. Submitted for publication to BMC Medical research methodology Short-221 term forecasts of the covid-19 epidemic in guangdong and zhejiang, china Semi Parametric Regression