key: cord-0021313-def9nr9x authors: Giles, John R.; Peel, Alison J.; Wells, Konstans; Plowright, Raina K.; McCallum, Hamish; Restif, Olivier title: Optimizing noninvasive sampling of a zoonotic bat virus date: 2021-08-27 journal: Ecol Evol DOI: 10.1002/ece3.7830 sha: 714ed3717c84efa60fa71df72977b0d92df36d99 doc_id: 21313 cord_uid: def9nr9x Outbreaks of infectious viruses resulting from spillover events from bats have brought much attention to bat‐borne zoonoses, which has motivated increased ecological and epidemiological studies on bat populations. Field sampling methods often collect pooled samples of bat excreta from plastic sheets placed under‐roosts. However, positive bias is introduced because multiple individuals may contribute to pooled samples, making studies of viral dynamics difficult. Here, we explore the general issue of bias in spatial sample pooling using Hendra virus in Australian bats as a case study. We assessed the accuracy of different under‐roost sampling designs using generalized additive models and field data from individually captured bats and pooled urine samples. We then used theoretical simulation models of bat density and under‐roost sampling to understand the mechanistic drivers of bias. The most commonly used sampling design estimated viral prevalence 3.2 times higher than individual‐level data, with positive bias 5–7 times higher than other designs due to spatial autocorrelation among sampling sheets and clustering of bats in roosts. Simulation results indicate using a stratified random design to collect 30–40 pooled urine samples from 80 to 100 sheets, each with an area of 0.75–1 m(2), and would allow estimation of true prevalence with minimum sampling bias and false negatives. These results show that widely used under‐roost sampling techniques are highly sensitive to viral presence, but lack specificity, providing limited information regarding viral dynamics. Improved estimation of true prevalence can be attained with minor changes to existing designs such as reducing sheet size, increasing sheet number, and spreading sheets out within the roost area. Our findings provide insight into how spatial sample pooling is vulnerable to bias for a wide range of systems in disease ecology, where optimal sampling design is influenced by pathogen prevalence, host population density, and patterns of aggregation. Recent emergence of bat-borne viruses has motivated an increase in ecological and epidemiological studies on bat populations at the global scale (Calisher et al., 2006; Halpin et al., 2007; Wang & Cowled, 2015) . Initial efforts focused on discovering the reservoir host(s) of these emerging infections (Breman et al., 1999; Chua et al., 2002; Halpin et al., 2000; Jayme et al., 2015; Li et al., 2005; Towner et al., 2009 ) and identifying other potential viral zoonoses in bats Drexler et al., 2012; Quan et al., 2013; Smith & Wang, 2013) . However, less work has been done to describe the dynamics of viruses in bat populations in time and space . Spatiotemporal sampling is therefore critical to provide insights into the broader ecological context surrounding spillover and to understand the factors that lead to the emergence of batborne viral diseases in humans . A common approach in bat-borne disease research involves the capture of many individual bats repeatedly over time, where bats are sampled (e.g., serum, urine, feces, saliva) and tested for viral presence using serology or PCR. In the best case scenario, repeated (longitudinal) samples are obtained from individuals, enabling description of dynamics at the individual level. Individual-level longitudinal data are rare and are most often available for highfidelity cave-roosting bats which can be recaptured at the same roosting site (Streicker et al., 2012; Towner et al., 2009) . Longitudinal data are very difficult to gather from tree-roosting megachiroptera, such as the highly mobile nomadic foragers Pteropus and Eidolon genera ); therefore, individual-level sampling in this context is typically done cross-sectionally over time. Moreover, catching individual canopy roosting bats is logistically challenging and expensive, and therefore, sample sizes are often too small to detect pathogens that circulate at low prevalence. Therefore, much research has supplemented the capture of individual bats with a noninvasive sampling technique that uses plastic sheets to collect pooled samples of bat excreta (e.g., urine and feces) under bat roosts referred to as "under-roost sampling" (Baker et al., 2012 (Baker et al., , 2013 Bourgarel et al., 2018; Chua, 2003; Chua et al., 2001 Chua et al., , 2002 Field et al., 2011 Field et al., , 2015 Lim et al., 2019; Lima et al., 2013; Marsh et al., 2012; Memish et al., 2013; Mendenhall et al., 2019; Peel et al., 2019; Pritchard et al., 2006; Smith et al., 2011; Valitutto et al., 2020; Wacharapluesadee et al., 2010) . Under-roost sheet sampling was initially implemented in 1998 to isolate Nipah and Tioman viruses from urine collected from Pteropus hypomelanus and P. vampyrus in Malaysia (Chua, 2003; Chua et al., 2001 Chua et al., , 2002 . It has subsequently been widely adopted to study coronaviruses (Bourgarel et al., 2018; Lim et al., 2019; Lima et al., 2013; Memish et al., 2013; Mendenhall et al., 2019; Valitutto et al., 2020) and henipaviruses (Baker et al., 2012 (Baker et al., , 2013 Field et al., 2011 Field et al., , 2015 Marsh et al., 2012; Peel et al., 2019; Pritchard et al., 2006; Smith et al., 2011; Wacharapluesadee et al., 2010) in bat populations. The most salient complication of under-roost sampling is that it only provides indirect measures of viral prevalence; that is, viral presence-absence is recorded for a group of bats roosting above a sampling sheet during a certain time period. In this scenario, samples are comprised of urine droplets or fecal particles from an "area" that may be pooled to constitute sufficient volume for an array of molecular assays (i.e., PCR or viral isolation). Although this is a necessary compromise, the clustered nature and fluctuations of bat density within a roost may confound results because an unknown and variable number of individuals contribute to a sample. In lieu of these confounding effects, under-roost sampling as it is commonly implemented may therefore introduce systematic sampling bias in the form of increased sensitivity to detecting virus. The increased sensitivity of pooled samples in disease surveillance is well-known. Sample pooling was first used during World War II to avoid the "expensive and tedious" process of monitoring syphilis in US soldiers (Dorfman, 1943) . It has since been used as a cost-effective method to screen for HIV infection in developing countries (Behets et al., 1990; Litvak et al., 1994) , and more recently, it has been employed to increase the efficiency of detecting cases of SARS-CoV-2 infection in the ongoing COVID-19 pandemic (Aragón-Caqueo et al., 2020; Griesemer et al., 2020; Narayanan et al., 2020) . Pooled sample testing is also common in surveillance of agricultural diseases of livestock (Arnold et al., 2005; Christensen & Gardner, 2000) , poultry (Arnold et al., 2009; Fereidouni et al., 2012) , and aquaculture (Laurin et al., 2019) , where a pooled sample is used to determine the presence or absence of a disease within a closed population. The resource efficiency of such pooled sampling techniques stems from heightened sensitivity of quantitative PCR tests within each sample (Muniesa et al., 2014; Muñoz-Zanzi et al., 2006) . In this regard, pooled sampling is well-suited for disease surveillance because the higher sensitivity is advantageous when pathogen prevalence is very low and access to individuals in the population is hindered. The high sensitivity of pooled samples, however, becomes problematic when used to estimate prevalence (Cowling et al., 1999 )-a classic statistical problem resulting from data aggregation, often referred to as the "ecological fallacy" (Robinson, 2009 ). In the context of under-roost sampling to estimate prevalence of a bat virus, this sampling conundrum is inherited from the initial application of the under-roost sampling technique which was to detect and isolate viral agents, not necessarily to study viral dynamics. Still, a few recent studies have employed the technique to describe temporal patterns in viral prevalence Páez et al., 2017; Peel et al., 2019; Wacharapluesadee et al., 2010) ; however, the extent to which the data are vulnerable to sampling bias has not been explored. Here, we contribute the first modeling study to theoretically explore the application of under-roost sheet sampling to estimating viral prevalence in tree-roosting bat populations and quantify the potential sampling bias introduced by different sampling regimes. We focus on tree-roosting pteropodid bats because they are reservoir hosts of henipaviruses which constitute public health risks across Africa, Asia, and Australia; based on their highly mobile population structure, under-roost sampling techniques are especially useful but also prone to bias due to sample pooling. To show the extent of estimation bias resulting from sample pooling on pathogen prevalence, we fit generalized additive models (GAM) to previously published data of Hendra virus prevalence variation over time in Australian fruit bats at the individual level and two levels of sample pooling. We then developed spatial simulation models of bat density within a tree roost and under-roost sampling designs and then performed a global sensitivity analysis to assess which aspects of under-roost sampling impact sampling bias most strongly. Our GAM results show that pooling of urine samples collected with the underroost sampling method leads to overestimation of the prevalence of virus. Further, we show that our simulation model elucidates the mechanistic drivers of estimation bias and provides recommendations on how to optimize under-roost sampling for the surveillance of infectious bat viruses by minimizing bias and maintaining sufficient detection rates. To assess potential sampling bias in estimates of the temporal fluctuations in viral prevalence resulting from pooled samples, we fitted generalized additive models (GAMs; Wood, 2006) to existing "presence-absence" field data of virus detections. From these data, we modelled the probability of viral presence as the response variable and sampling date as predictor variable for different lev- conclusions about the temporal fluctuation in viral prevalence. The field data were collected as part of a Hendra virus study in Australia that collected almost 15,000 urine samples across a 2,300 km latitudinal gradient , methods for data collection have been described in detail elsewhere (Edson, Field, McMichael, Vidgen, et al., 2015; Edson et al., 2019; Field et al., 2011 Field et al., , 2015 . The data are comprised of two field survey efforts conducted between June 2013 and June 2014 in Boonah, Queensland, at an urban roost of pteropodid bats (i.e., Pteropus alecto, P. poliocephalus, and P. scapulatus). The first data set measures viral infection and routes of excretion for 1,012 individual black flying foxes (P. alecto) captured at the study roost (see Figure 1a ). Viral infection was recorded as present if RT-PCR analyses returned a cycle threshold (Ct) value of <40 for samples taken from any route of excretion (e.g., urine, urogenital, serum, nasal, oral, and rectal; see Edson, Field, McMichael, Vidgen, et al., 2015; Edson et al., 2019) . The second data set measured viral prevalence at the roost scale using the under-roost sheet sampling method where large plastic sheets are divided into quadrants and urine samples are pooled within each sheet quadrant for RT-PCR testing with the same threshold of Ct <40 for positive samples (see Field et al., 2011 Field et al., , 2015 . We used these roost-scale data to calculate viral prevalence F I G U R E 1 Conceptual drawing of sampling techniques commonly used to estimate viral prevalence at the roost level. Individual-level sampling is shown in (a) where individual bats are captured and each provides a sample that is used to calculate prevalence. Both (b) and (c) show under-roost sampling techniques that collect urine droplets from plastic sheets laid beneath roosts. The pooled quadrant technique (b) pools urine droplets that fall within each of the four quadrants of a plastic sheet. The pooled sheet technique (c) pools urine droplets within each plastic sheet. The examples of prevalence calculation show how overestimation of prevalence at the roost level can occur due to multiple bats contributing to a sample. Note that this toy example assumes all bats are captured and the assay used to test samples has perfect sensitivity and specificity CI). We fit a GAM with quasi-binomial error structure and a thinplate spline based on date of sampling to the viral presence data collected at the individual, pooled quadrant, and pooled sheet levels (Wood, 2006) . We then assessed the smoothed models by calculating the bias in mean estimated viral prevalence of the pooled quadrant and pooled sheet models in comparison with individuallevel model. Day roosts of pteropodid bats encompass many trees, with individuals often moving within the roost throughout the day due to mating behaviors or in response to roost disturbance (Kunz & Fenton, 2006) , so we modeled bat density within a generic bat roost with a Poisson cluster process of roosting positions and a spatial Gompertz probability density function that reflects movement within a roosting site. Specifically, bat density within roost area A (a disk with radius r) is constructed in four stages that include the following: (1) placement of roosting trees within the roost area, (2) clustering of individual bats around them, (3) individual-level movement within a tree, and (4) a separate model of roost-wide movement. We used a Thomas cluster process to simulate the spatial clustering of bat positions around trees, using the rThomas function from the spatstat package in the R programming language (Baddeley et al., 2015; R Core Team, 2016) . Tree locations (parent points) were randomly distributed within A subject to a homogeneous intensity , given by n t /A, where n t is the number of occupied trees in the roost. The mean number of bats in each roost tree µ is simulated by the cluster point process so that µ is stochastic with Poisson distributed error. Individual bat positions are determined according to an isotropic Gaussian kernel centered on each tree with radius r t . Note that even when parameters , r t , and µ are fixed, the number of bats in the roost N b will still vary upon each simulation because the Poisson point process is stochastic. In simulation scenarios, we chose ranges for parameters of roost structure and bat density based upon unpublished field data and expert observations (see Table 1 for a list of variables used to define each scenario). Bat movement was modeled at the individual level and roost level (see Figure 2 ). To model individual-level movement, we calculated a kernel density estimate for the simulated point process that sums Gaussian kernels with a radius of 0.5 m centered on each bat position. We modeled roost-wide movement with a spatial Gompertz probability density using the dgompertz function from the flexsurv package (Jackson, 2014) . The distribution of the Gompertz is controlled by shape and rate parameters that determine the function's curvature and rate of decay, respectively. We chose ranges for these parameters that make the least assumptions about movement, where values are high for a large area at the roost's center, but decay quickly toward the edges. To make the final kernel density estimate for bat density, we combined models of individual-and roost-level movement and ensured that the function integrated to 1 (Figure 2 ). Under-roost sampling designs typically use large sheets placed under-roost trees, and urine droplets or fecal particles are pooled into an aggregate sample from the area (or subarea) of each sheet. Most studies provide minimal description of the sheet sampling design; however, Wacharapluesadee et al. (2010) , Field et al. (2015) , describe their quadrant-based sheet design in greater detail (i.e., sheet dimensions, number of sheets, pooling of urine samples). We therefore explored the effect of four different under-roost sheet sampling designs: quadrant, uniform, stratified, and random (Thompson, 2012) [see Figure 3 ]. An efficient way to simulate each sampling design within two-dimensional circular space uses hexagonal tiles, where the size and combination of tiles selected can replicate different sheet-based sampling designs. We calculated the number of bats roosting and moving above a sampling sheet by using the area of each hexagonal polygon to define the space of integration S. We determined the dimensions for the quadrant-based design using descriptions of under-roost sheet sampling of Australian fruit bats found in Field et al. (2015) and . Here, 10 large 3.6 × 2.6 m sheets were placed under the roost and divided into 1.8 × 1.3 m quadrants, where urine samples were pooled within each quadrant (allowing up to 4 samples per large sheet). Considering each quadrant to be its own "sheet," we replicated this sampling design by making a hexagonal grid with each tile area equivalent to a 1.8 × 1.3 m rectangular sheet. Groupings of 4 hexagonal tiles then suffice as a large sheet with 4 quadrants. In each simulation, we generated 10 sheet positions within A using a simple sequential inhibition point process with the rSSI function of the spatstat package (Baddeley et al., 2015) . To ensure that all sheets retained the same quadrant orientation and that no two sheets were directly adjacent, we generated sheet positions within a disk of A − 3m and set the inhibitory radius to 3s, where s is the hexagonal cell size. The four cell centers nearest each of the 10 simulated point locations comprised the 40 (10 × 4 quadrants) hexagonal tiles for the quadrant-based design ( Figure S2 ). To test our hypothesis that a larger number of smaller sheets will estimate roost-level prevalence more accurately, we generated hexagonal grids with cell size s that select h number of tiles in a uniform, stratified, or random pattern. Both uniform and random designs are straightforward, but the stratified sampling design was generated using a sequential inhibition point process, where random points are laid down sequentially, retaining only those that are placed further than a specified inhibitory radius r s . This is similar to a person attempting to lay down sheets randomly with one rule in mind-"Do not place sheets within r s distance of each other." We simulated sheet sampling designs with the sheetsamp function in the R code provided in Supporting Information. Figure 3 displays an example of a simulation which has generated the previously implemented largesheet quadrant design and three additional "small-sheet" designs that use a larger number of smaller (1 × 1 m) more dispersed sheets. Given a roost area A, the polygons produced by the sheetsamp Bats in the upper strata of the canopy are less likely to contribute urine to the sheet below because of obstruction by individuals below or factors in the environment (e.g., wind, tree branches). Therefore, a urine sample is collected from each of the sheets S according to a probability of urine contribution and collection p u , with variation given by N(p u , σ 2 ). The number of individuals contributing to each pooled sample C b is calculated as Each simulated iteration generates an estimated intensity function for bat density and then performs under-roost sampling using each of the four sampling designs. Therefore, each sampling design is tested using the same set of bat density functions, facilitating comparison. Parameters for sheet size s and number of sheets H were fixed for the quadrant-based design to replicate the previously implemented field methods described above. Parameters controlling sampling dimensions for the three small-sheet designs were either fixed or varied over a range of plausible values depending on the question the simulations were meant to address-see Table 1 for a list of parameter values used in each scenario. For each iteration, we We also calculated additional metrics such as the probability of obtaining a negative sheet (1 − p u ) D b , the occurrence of a false negative (� p i = 0 | p i > 0), Moran's I among sheets (Getis, 1973) , and the Clark-Evans R clustering coefficient for individual bat roosting positions (Clark & Evans, 1954) . In the scenarios 1 and 2, we explored local sensitivity between estimated prevalence and some possible confounders and sources of bias, with values of other parameters fixed. To perform a simple comparison between the four under-roost sheet sampling methods, we fixed all values of bat density and movement to simulate a roost with a 30 m radius and a mean number of 5,000 individuals (see scenario 1 in Table 1 ). We performed 1,000 simulations with true prevalence p set at a plausible value of 0.1 according to mean prevalence estimated for roosts near the QLD-NSW border in Field et al. (2015) . Estimated prevalence values were plotted, along with the probability of obtaining a negative sheet for each sampling design. To explore estimation bias over all values of true prevalence, we kept parameter values the same as scenario 1, but we allowed true prevalence to vary from 0 to 1, and then plotted true versus estimated prevalence along with mean estimation bias (scenario 2 in Table 1 ). In scenarios 3 and 4, we performed a large number of simulations (n sims = 10,000) and allowed parameter values for each simulation to vary using Latin hypercube sampling. We then analyzed the output using boosted regression trees (BRTs; De'ath, 2007; Elith et al., 2008) as a global sensitivity analysis (described in Prowse et al. (2016) ) to identify the main sources of estimation bias and determine the optimal application of under-roost sheet sampling. Here, to link simulation inputs (varied parameters) with simulation outputs (we used estimation bias and false-negative rate as responses). Parameter values were randomly sampled using the randomLHS function in the lhs package (Carnell, 2016) , and BRTs were fitted using the gbm.step function and the gbm and dismo packages (Hijmans et al., 2016; Ridgeway, 2016) . BRTs were fitted with appropriate error structure (Gaussian or Binomial) and meta-parameters set to ensure that the number of fitted trees exceeded 1,000, following Elith et al. (2008) , with tree complexity, learning rate, bagging fraction, and number of cross-validation folds set to: 4, 0.005, 0.7, and 10, respectively. BRTs act as an effective emulator here because they fit complex nonlinear relationships with up to third-order interactions (tree complexity = 4) among model parameters. Relative variable influence and individual response curves for each variable further allow general description of how sensitive estimation bias is to each parameter. In scenario 3, we compare the quadrant-based design with the stratified design while accounting for the variability in all other parameters to determine the main drivers causing differences in estimation bias. We chose to use only the stratified design as a candidate small-sheet design because the first two simulations suggested that the three small-sheet designs produce similar results, and the stratified design is most plausibly replicated in the field. Based on preliminary models, it appeared that a small-sheet sampling design which used ~100 sheets with an area of ≤1 × 1m 2 could attain low estimation bias. So, we fixed the parameters controlling sheet dimensions accordingly to facilitate comparison between the quadrant and stratified methods (see simulation 3 in Table 1 ). To explore the optimal application of the stratified sampling design, we performed a global sensitivity analysis using only the stratified sampling design in scenario 4. All parameters were varied as in scenario 3; however, sheet area s, number of sheets H, and distance between sheets (d s ; previously fixed at 2 m) were also varied over intervals of interest (scenario 4 in Table 1 ). We used a Latin hypercube to sample the parameter space and then fitted two BRT models using the variables that control the sheet sampling design as predictors (i.e., sheet area, number of sheets, distance between sheets, and Table 1 ). We then used the quadrant-based sheet sampling design to match the under-roost sampling techniques that produced the pooled quadrant level and pooled sheet-level data (see Field et al., 2011 Field et al., , 2015 . To assess how well this scenario simulates the observed field data, we then calculated the mean bias of the pooled quadrant and pooled sheet sampling methods for all simulations and compared them with the observed bias in the field data. times higher (0.31-6.5 95% CI) when using the pooled quadrantlevel data and 8.5 times higher (0.47-23.2 95% CI) for the pooled sheet-level data (see Figure 1 for sampling techniques and Figure 4 for fitted models). Unsurprisingly, these models indicate that underroost sampling methods that use the quadrant-based design to sample tree-roosting fruit bats are indeed prone to overestimation of viral prevalence. Further, when we used the Beta distribution-fitted to observed values of viral prevalence from the individual-level data-as values of true prevalence in under-roost sampling simulations (see scenario 5 in Table 1 ), we obtained similar estimates of sampling bias for the quadrant-based sheet design (Table S1 ). This simulation scenario estimated the mean bias in prevalence to be 0.06 (−0.06 to 0.38 95% CI) at the pooled quadrant level and 0.21 (−0.06 to 0.73 95% CI) at the pooled sheet level. This amount of estimation bias produced estimates of viral prevalence that were on average 2.5 times higher (0-12.1 95% CI) for the pooled quadrant level and 6.9 times higher (0-39.4 95% CI) for the pooled sheet level compared with simulated values of true prevalence. While the confidence intervals in the simulated data are larger than those observed in the field data, the values of mean bias are closely comparable which provides validation for using the theoretical models to assess optimal under-roost sampling designs. When we compared the quadrant-based sheet design to the smallsheet designs with fixed model parameters (scenario 1 in Table 1 ), we found that at a low value of true prevalence (0.1), the quadrant design exhibited strong positive bias and all three small-sheet designs produced similar estimates close to the fixed value of true prevalence (see top row of Figure S3 ). The differences in estimated values can be partially attributed to the increased number of bats that roost and move above the larger sheets, which decrease the probability of obtaining a negative sheet (see bottom row of Figure S3 ). Local sensitivity analysis revealed that, at a low value of true prevalence, prevalence estimation for the quadrant-based design is sensitive to spatial autocorrelation among sheets (Moran's I) and clustering of bat roosting positions (Clark-Evans R; Figures S4 and S5) . However, the small-sheet designs are sensitive to the number of bats in the roost (N b ; Figure S6 ). This indicates that, at low values of true prevalence, the quadrant-based method remains sensitive to viral presence regardless of the roost population size, but will tend to overestimate viral prevalence due to the spatial clustering of individuals common to most tree-roosting bats. Conversely, small-sheet methods appear less affected by clustering and spatial autocorrelation among sheets, but they are likely to be less sensitive to viral presence at low population sizes. In scenario 2, where we allowed true prevalence to vary between 0 and 1 (Table 1), we found that the quadrant design had 5-7 times F I G U R E 4 Examples of one simulation of each of the four underroost sheet sampling designs explored in this study generated for a roost with a 30 m radius. The quadrant design (top left), which follows methods found in previously published studies Field et al., 2011 Field et al., , 2015 , is comprised of 10 3.6 × 2.6 m sheets, each divided into 1.8 × 1.6 m quadrants for pooling urine samples. The other three designs (uniform, stratified, and random) are all "small-sheet" designs that reduce sheet area, increase sheet number, and disperse sheets about the roost area. The smallsheet designs plotted above each contain 100 one-m2 sheets. The stratified design is generated using a sequential inhibition process with and inhibitory radius of 2 m. [Correction added on 17 September 2021, after first online publication: Figure 4 caption has been updated in this version.] the positive bias as the small-sheet designs. The mean estimation bias was 0.21 for the quadrant design, and 0.04, 0.03, and 0.04 for the uniform, stratified, and random designs, respectively ( Figure 5 ). This suggests that, for a roost size of 3,000-8,000 bats, the estimation bias will consistently be greater for the quadrant design, especially for intermediate values of prevalence. Additionally, the similarity among the uniform, stratified, and random designs indicates that the exact spatial pattern of the small-sheet method is not important-estimation bias is improved by reducing sheet size, increasing the number of sheets, and spreading sheets out within the roost area. Using these sampling strategies to reduce estimation bias allows under-roost sampling techniques to more effectively emulate individual-level sampling. Scenario 3 showed significant differences in estimation bias between quadrant and stratified designs, even when we allowed all parameters to vary (Figure 6e ). Summary of simulation output with the BRT emulator showed higher bias for the quadrant design, which is most strongly influenced by the total number of individual bats sampled across all sheets ( ∑ C b ; Figure 6a sheets that collected a urine sample, and 64% collected at least 40 samples ( Figure S7 ). Under-roost sampling of bat viruses has been employed previously in Africa, Asia, and Australia; however, little attention has been given to the effects of sampling bias or optimization of sampling designs. We used data from field studies of Hendra virus in Australia, which have been extensively studied at both the individual and roost scales to describe temporal and spatial dynamics of viral transmission in bat populations (Edson, Field, McMichael, Vidgen, et al., 2015; Edson et al., 2019; Field et al., 2011 Field et al., , 2015 Smith et al., 2011) and ecological drivers of excretion into the environment (Giles et al., 2018; Páez et al., 2017) . We combined these data to compare viral prevalence estimated using individual-level data to that estimated at two levels of sample pooling and found that systematic pooling of urine samples can lead to overestimation of viral prevalence (Figure 4 and Figure S1 ). We also show that theoretical models of bat density F I G U R E 6 Results of the global sensitivity analysis performed in scenario 3, where the quadrant (blue points) and stratified (orange points) designs are compared to determine what drives differences in estimation bias between the two designs. Table 1 shows the parameters used in the simulation. The barplot (a) shows the relative influence of each parameter determined by a boosted regression tree emulator. Plots e and f show the value of estimation bias fitted by the emulator as a function of five influential parameters (blue: quadrant, orange: stratified sampling design) and under-roost sampling can replicate patterns of estimation bias observed in field data, allowing us to use simulations to optimize under-roost sampling designs. Previous work has elucidated factors contributing to sampling bias of zoonotic diseases on larger temporal and spatial scales and noted the importance of targeted sampling designs , but to our knowledge, this is the first study to use data and models to investigate the impact of sampling bias on the estimation of viral prevalence in bat populations at the roost scale. The simulation scenarios we developed provide insight into the mechanistic drivers of estimation bias associated with under-roost sampling in a theoretical population of tree-roosting bats. First, sampling designs which use large sheets (larger than ~1 m 2 ) and/or sheet quadrants to pool urine samples are sensitive to viral presence, but they potentially overestimate viral prevalence with a bias up to 7 times greater than a design with a greater number of smaller sampling sheets ( Figure 5 ). Second, estimation bias is affected by the number of individuals allowed to contribute to a pooled sample and spatial autocorrelation among sampling sheets; however, these sources of bias can be reduced by adjusting the sheet sampling design ( Figure 6 ). And third, assuming a roost population size of over 5,000 bats, estimation bias can be sufficiently reduced by collecting 30-40 pooled urine samples using a stratified sheet sampling design that uses 80-100 sheets, each with an area of 0.75-1 m 2 , that are separated by 1-3 m (Figure 7 and Figure S7 ). While field conditions may impact the total number of sheets that can be placed under a roost, our results indicate that the large number of sheets increases the likelihood that a urine sample is contributed to a sheet despite the smaller per sheet area. These insights from simulation models enable well-informed hypotheses about the optimal sheet design for under-roost sampling, which can help to refine the application of under-roost sampling in the surveillance of infectious viruses in wild bat populations. Our recommendations to optimize under-roost sampling differ from those previously implemented in the field in that they reduce the size of sheet area, increase the number of sheets, and disperse them about the roost area. In relation to the best-described methods in the literature, this is roughly equivalent to halving the size of sheet quadrants in Field et al. (2015) and to make 80 0.9 × 0.8 m sheets, and then separating each where they tested individual droplets to minimize the risk of multiple individuals contributing to a sample, but this requires low bat density and returns small sample volumes, which limits larger-scale application. Therefore, "optimal" application of an under-roost sampling design is still inherently limited to pooled sheet-level estimates of prevalence. We also acknowledge that local topography around a roost can make implementation challenging. Local factors at the roosting site (e.g., physical obstructions, understory vegetation, slope) must be considered when applying sampling designs in the field. This highlights the difficulty in entirely removing positive bias associated with under-roost sampling of bat viruses; however, it can be mitigated with a sampling strategy that reduces the area of urine pooling and limits spatial autocorrelation among sheets. Overall, our results indicate that under-roost sampling designs as they have been applied in the past are poorly suited to studying viral dynamics because of positive sampling bias. For example, Páez et al. (2017) analyzed data from an under-roost sampling study and noted that a large amount of variation in viral prevalence was explained by differences in sampling sheets, indicating that population structure within roosts or sampling bias may have introduced additional variation in estimated prevalence. In light of the results from our simulation models, pooling urine samples drawn from large-sheet areas effectively inflates the number of Bernoulli trials in each Binomial sample, which then increases the sensitivity of detection at the roost level. This may be observed as overestimation when the pooled samples are subsequently used to calculate roostlevel prevalence in field studies. Although we focus on roost-level sensitivity here, we note that sample pooling could also impact assay sensitivity within a sample through the dilution (i.e., multiple species contributing to a sample) or concentration (i.e., through partial evaporation) of urine or fecal matter on plastic sheets. Collecting pooled samples from a smaller sheet area may therefore reduce the number of bats contributing to a sample, which may require practical consideration of sample volume for required assays. Therefore, these small-sheet sampling designs have the potential to reduce overestimation, with the caveat that smaller sheets are less likely to collect adequate sample volumes, necessitating a larger number of sheets placed under the roost. While the data and models presented here focus on testing pooled urine samples, positive bias associated with under-roost sampling designs also applies to viral pathogens found in bat fecal samples (Ge et al., 2012) . Given that bats have been implicated as the probable natural hosts of coronaviruses from which SARS-CoV-2 emerged to cause the global COVID-19 pandemic (Zhou et al., 2020) , there has been a call for increased surveillance of novel coronaviruses in wild bat populations (Wacharapluesadee et al., 2021) . In scenarios where surveillance of coronaviruses (or other viral pathogens) aims to estimate viral prevalence using excreta collected with underroost sampling designs, the modeling techniques we have employed here can be applied to optimize the sampling strategy. Optimization of noninvasive sampling in this context will be an important tool to balance surveillance efforts required for public health with conservation of wild bat populations. We have shown that sheet design in under-roost sampling can have a significant impact on both the estimation of viral prevalence and the false-negative rate when determining viral presence. The sampling design employed, therefore, depends on the aim of the study, because viral discovery and studies on dynamics require different approaches. Research focusing on viral discovery requires field methods that reduce the probability of a false negative regarding viral presence (sensitivity). Studies on dynamics must estimate prevalence with low bias, requiring samples that are accurately classified as present and absent (specificity). Therefore, if a study includes multiple aims, an efficient adaptation of a small-sheet design includes pooling urine over multiple spatial scales, with samples pooled over a large area to test for viral presence with high sensitivity and samples pooled over a small area for estimating prevalence with high specificity. This type of multistage approach is analogous to "herd-level" testing where a pooled sample is used to determine the presence or absence of a disease, if a pooled sample is found positive, individual-level samples are then used to identify infected individuals or calculate prevalence more accurately (Martin et al., 1992) . Multistage sample pooling may be especially useful for other disease systems where individual capture of free-ranging host species is not practical, such as aquatic animals (Laurin et al., 2019; Sabino-Pinto et al., 2019) , poultry (Arnold et al., 2009; Fereidouni et al., 2012) , livestock (Arnold et al., 2005; Christensen & Gardner, 2000) , and wildlife (Walton et al., 2016) . Given the challenges associated with under-roost sampling, our simulation models and recommendations for a small-sheet sampling design provide specific changes to existing methods that facilitate further adaptation of sampling designs in a model-guided fieldwork approach . If applied in a manner suited for study aims, it can achieve longitudinal sampling of a bat population at the roost scale that is both cost effective and reduces exposure to infectious viruses. A strategy to estimate unknown viral diversity in mammals Optimization of group size in pool testing strategy for SARS-CoV-2: A simple mathematical model A modelling approach to estimate the sensitivity of pooled faecal samples for isolation of Salmonella in pigs The estimation of pooled-sample sensitivity for detection of Salmonella in turkey flocks Spatial point patterns: Methodology and applications with R Metagenomic study of the viruses of African straw-coloured fruit bats: Detection of a chiropteran poxvirus and isolation of a novel adenovirus Cocirculation of diverse paramyxoviruses in an urban African fruit bat population Temporal and spatial limitations in global surveillance for bat filoviruses and henipaviruses Successful use of pooled sera to determine HIV-1 seroprevalence in Zaire with development of cost-efficiency models Circulation of alphacoronavirus, betacoronavirus and paramyxovirus in hipposideros bat species in Zimbabwe A search for Ebola virus in animals in the Democratic Republic of the Congo and Cameroon: Ecologic, virologic, and serologic surveys Important reservoir hosts of emerging viruses lhs: Latin Hypercube Samples Herd-level interpretation of test results for epidemiologic studies of animal diseases A novel approach for collecting samples from fruit bats for isolation of infectious agents Isolation of Nipah virus from Malaysian Island flying-foxes Tioman virus, a novel paramyxovirus isolated from fruit bats in Malaysia Distance to nearest neighbor as a measure of spatial relationships in populations Comparison of methods for estimation of individual-level prevalence based on pooled samples Boosted trees for ecological modeling and prediction The detection of defective members of large populations Bats host major mammalian paramyxoviruses Flying-fox roost disturbance and Hendra virus spillover risk Routes of Hendra virus excretion in naturally-infected flying-foxes: Implications for viral transmission and spillover risk Time of year, age class and body condition predict Hendra virus infection in Australian black flying foxes (Pteropus alecto) A working guide to boosted regression trees Saving resources: Avian influenza surveillance using pooled swab samples and reduced reaction volumes in real-time RT-PCR Hendra virus infection dynamics in Australian Fruit Bats Spatiotemporal aspects of Hendra virus infection in Pteropid bats (flying-foxes) in Eastern Australia Metagenomic analysis of viruses from bat fecal samples reveals many novel viruses in insectivorous bats in China Classics in human geography revisited: Cliff, A gilesjohnr/batsamp: Code and data used in Giles et al 2021 -Optimizing non-invasive sampling of a zoonotic bat virus v1.0 Environmental drivers of spatiotemporal foraging intensity in fruit bats and implications for Hendra virus ecology Assessment of sample pooling for clinical SARS-CoV-2 testing Emerging viruses: Coming in on a wrinkled wing and a prayer Isolation of Hendra virus from pteropid bats: A natural reservoir of Hendra virus Demography of straw-colored fruit bats in Ghana dismo: Species Distribution Modeling flexsurv: Flexible parametric survival and multi-state models Molecular evidence of Ebola Reston virus infection in Philippine bats Bat ecology (799 pp.) To pool or not to pool? Guidelines for pooling samples for use in surveillance testing of infectious diseases in aquatic animals Bats are natural reservoirs of SARS-like coronaviruses Detection and characterization of a novel bat-borne coronavirus in Singapore using multiple molecular approaches Detection of Alphacoronavirus in velvety free-tailed bats (Molossus molossus) and Brazilian free-tailed bats (Tadarida brasiliensis) from urban area of Southern Brazil Screening for the presence of a disease by pooling sera samples Cedar virus: A novel Henipavirus isolated from Australian Bats Evaluating the health status of herds based on tests applied to individuals Physiological stress and Hendra virus in flyingfoxes (Pteropus spp Middle East respiratory syndrome Coronavirus in bats, Saudi Arabia Discovery and characterization of novel bat coronavirus lineages from Kazakhstan Estimation of the relative sensitivity of qPCR analysis using pooled samples Factors affecting sensitivity and specificity of pooled-sample testing for diagnosis of low prevalence infections Pooling RT-PCR or NGS samples has the potential to cost effectively generate estimates of COVID-19 prevalence in resource limited environments. medRxiv fields: Tools for spatial data Conditions affecting the timing and magnitude of Hendra virus shedding across pteropodid bat populations in Australia Synchronous shedding of multiple bat paramyxoviruses coincides with peak periods of Hendra virus spillover Sampling to elucidate the dynamics of infections in reservoir hosts Pulau virus; a new member of the Nelson Bay orthoreovirus species isolated from fruit bats in Malaysia An efficient protocol for the global sensitivity analysis of stochastic ecological models Bats are a major natural reservoir for hepaciviruses and pegiviruses R: A Language and Environment for Statistical Computing Model-guided fieldwork: Practical guidelines for multidisciplinary research on wildlife ecological and epidemiological dynamics gbm: Generalized Boosted Regression Models Ecological correlations and the behavior of individuals Detectability vs. time and costs in pooled DNA extraction of cutaneous swabs: A study on the amphibian chytrid fungi Identifying Hendra virus diversity in pteropid bats Bats and their virome: An important source of emerging viruses capable of infecting humans Ecological and anthropogenic drivers of rabies exposure in vampire bats: Implications for transmission and control Sampling Isolation of genetically diverse Marburg viruses from Egyptian fruit bats Detection of novel coronaviruses in bats in Myanmar A longitudinal study of the prevalence of Nipah virus in Pteropus lylei bats in Thailand: Evidence for seasonal preference in disease transmission Evidence for SARS-CoV-2 related coronaviruses circulating in bats and pangolins in Southeast Asia The ecology of wildlife disease surveillance: Demographic and prevalence fluctuations undermine surveillance Bats and viruses: A new frontier of emerging infectious diseases Generalized additive models: An introduction with R A pneumonia outbreak associated with a new coronavirus of probable bat origin The authors declared no conflicts of interest. JRG conducted the AJP was supported by a Queensland Government Accelerate