key: cord-0253857-iqcnfax5 authors: Sjodin, Anna R.; Anthony, Simon J.; Willig, Michael R.; Tingley, Morgan W. title: Accounting for imperfect detection reveals role of host traits in structuring viral diversity of a wild bat community date: 2020-06-30 journal: bioRxiv DOI: 10.1101/2020.06.29.178798 sha: c1dc2fbf06664ac6d33c024b8576d02f90e39142 doc_id: 253857 cord_uid: iqcnfax5 Understanding how multi-scale host heterogeneity affects viral community assembly can illuminate ecological drivers of infection and host-switching. Yet, such studies are hindered by imperfect viral detection. To address this issue, we used a community occupancy model – refashioned for the hierarchical nature of molecular-detection methods – to account for failed detection when examining how individual-level host traits affect herpesvirus richness in eight species of wild bats. We then used model predictions to examine the role of host sex and species identity on viral diversity at the levels of host individual, population, and community. Results demonstrate that cPCR and viral sequencing failed to perfectly detect viral presence. Nevertheless, model estimates correcting for imperfect detection show that reproductively active bats, especially reproductively active females, have significantly higher viral richness, and host sex and species identity interact to affect viral richness. Further, host sex significantly affects viral turnover across host populations, as females host more heterogeneous viral communities than do males. Results suggest models of viral ecology benefit from integration of multi-scale host factors, with implications for bat conservation and epidemiology. Furthermore, by accounting for imperfect detection in laboratory assays, we demonstrate how statistical models developed for other purposes hold promising possibilities for molecular and epidemiological applications. of host individuals must be sampled for population-level detection, and resulting datasets are 48 highly zero-inflated. Second, many wildlife viruses remain unknown to science [12, 13] . 49 Currently, two of the most common methods for characterizing viral communities and 50 finding new viral taxa are unbiased high-throughput sequencing (HTS; e.g. metagenomics) and 51 consensus polymerase chain reaction (cPCR; [10]). HTS is advantageous when the goal is to 52 construct an entire viral genome or when so little is known about the target taxon that primers 53 cannot be designed. However, HTS lacks the sensitivity of PCR methods. cPCR is a method that 54 relies on the use of degenerate primers for the broad amplification of both known and unknown 55 viruses [10, 12] . It targets genome regions that are highly conserved among related viruses (e.g. 56 at the family or genus level) and can detect multiple coinfecting viruses, if present. The 57 degenerate primers bind with different efficiency to different viruses within a targeted taxon, so 58 viruses with lower homology in the primer-binding region will not be detected as efficiently as 59 would viruses with high sequence homology. Additionally, degenerate primers are, by definition, 60 less specific. This reduces the sensitivity of cPCR and increases the probability of false negatives 61 when viral load in the sample is low. This is a particular problem in wildlife surveys, where 62 Immunity. From each captured bat, we recorded six traits: mass, forearm length, ectoparasite 145 intensity, sex, and reproductive status. Detailed descriptions of trait measurements can be found 146 in the Appendix. Additionally, wing punches were taken from each bat for a separate analyses. 147 To prevent re-sampling of recaptured bats, all subsequently captured bats with a punch-sized 148 hole in the wing were released. All methods were approved by the University of Connecticut 149 Institutional Animal Care and Use Committee (IACUC, protocol A15-032). were cloned into Strataclone PCR cloning vector (Figure 1, c1) , and twelve colonies were 155 sequenced to detect viral co-occurrence and confirm viral identity via cross-referencing against 156 the GenBank nucleotide database (Figure 1, d1) . No cut bands from assay one resulted in false 157 positives, so for assay two, only those gel products of expected size that were not present in 158 assay one (Figure 1 , b2) were cloned (Figure 1 , c2) and sequenced (Figure 1, d2) . Consequently, 159 there were multiple samples from assay two for which data exist only on herpesvirus detection 160 (positive or negative), but not on the identity of the detected herpesviruses (Figure 1, e2) . 161 Operational taxonomic units (OTUs) were determined using percent nucleotide identity and 162 affinity propagation [11, 52] and were used in place of viral species. 163 Each sample was tested for PCR inhibition using TaqMan Exogenous Internal Positive 164 Control Reagents (Applied Biosystems) with quantitative PCR as described in the 165 manufacturer's instruction manual. Amount of inhibition was determined using the standardized 166 difference of sample quantitation cycle (Cq) from the mean Cq of two negative controls. 167 To explore the effects of individual-level host traits on herpesvirus infection, we fit a 169 Bayesian community-level occupancy model to OTU-host occurrence data [53, 54] . This 170 framework explicitly differentiates between the underlying state process determining the 171 presence or absence of each OTU and the observation process determining the detection or non-172 detection of each OTU, conditional on its true presence. Although occupancy models have been 173 employed in the field of animal ecology to study the occurrence of vertebrates across a 174 landscape, the modeling framework has rarely yet been applied in molecular settings to account 175 for imperfect detection during indirect sampling, such as with eDNA [55, 56] In the model, response variable yi ,j,k is a binomial random variable denoting whether 183 herpesvirus OTU i was detected (yi ,j,k = 1) or not (yi ,j,k = 0) in host individual j during assay k (green 184 values in Figure 1e and f). A second response variable, Yj ,k, is a binomial random variable 185 denoting whether any herpesvirus, regardless of OTU identity, was detected (Yj ,k = 1) or not (Yj ,k = 186 0) in host individual j during assay k (purple values in Figure 1 e and f). The two response 187 variables, y and Y, are hierarchically related, yet independently observed via previously 188 described methods: cPCR identifies the presence or non-detection of any herpesvirus (Y), and 189 subsequent sequencing and bioinformatics identifies the presence or non-detection of particular 190 OTUs (y). 191 The model simultaneously estimates the probability of infection (yi ,j) and the probability 192 of detection (pi ,j,k) with a mixture model in the form yi ,j,k ~ Bernoulli (pi ,j,k * zi ,j), where pi ,j,k is the 193 probability of detecting OTU i in host individual j during assay k, and zi ,j is a latent variable 194 representing true infection of OTU i in host individual j, deriving from the equation zi ,j ~ Bernoulli 195 (yi ,j). A second latent variable (Zj) represents true herpesvirus infection, regardless of OTU 196 identity, in host individual j. The model's hierarchical structure allows the two response 197 variables (yi ,j,k and Yj ,k), as well as the two latent variables (zi ,j, and Zj ), to inform one another, as Zj = 198 max (zi, j) and Yj ,k ~ Bernoulli (passay * Zj ), where passay represents a constant probability that the cPCR 199 assay will detect a herpesvirus given the presence of at least one OTU. Together, the model 200 assumes that an empirical detection (yi ,j,k = 1, Yj ,k = 1) represents true viral infection, but that an 201 empirical non-detection (yi ,j,k = 0, Yj ,k = 0) could indicate either a true absence or a true infection 202 with failed detection. 203 To structure the model, sample inhibition and OTU subfamily were included as fixed 204 effects influencing OTU-specific detection probability (pi ,j,k). Host forearm length, cave identity, 205 ectoparasite intensity, sex, reproductive status, and the interaction between sex and reproductive 206 status were included as fixed effects for infection probability (yi ,j). Host species was included as a 207 random intercept for both detection and infection probabilities. 208 As is typical for community occupancy models, every intercept and slope parameter for a 209 given herpesvirus was assumed to be drawn from a hierarchical Normal distribution that 210 describes the distribution of slopes for all herpesvirus species available for sampling [59] . In this 211 way, primary inference on the community is made by the examination of the hyper-parameters 212 that provide a mean community-wide effect for any given covariate. As preliminary explorations 213 of the dataset indicated that some herpesviruses were unusually common or unusually rare, 214 instead of using a hierarchical Normal to describe the random distribution of occupancy 215 intercepts, we used a fat-tailed distribution, the t-distribution. Full model code is provided online 216 as Supplemental Material. 217 We executed the model in JAGS [60] using the R2jags package [61] with vague priors 218 (e.g., normal with µ = 0, t = 0.1, gamma with shape and rate = 0.01). We ran three chains of 219 15,000 iterations thinned by 50 with a burn-in of 25,000, yielding a posterior sample of 900 220 iterations across all chains. We checked convergence visually with traceplots and made 221 parameter inference based on 95% Bayesian credible intervals. Posterior predictive checks were 222 conducted for individual OTUs by calculating Bayesian p-values [62] and 90% Bayesian 223 credible intervals for mean detected prevalence of each OTU. A lack of fit for a particular OTU 224 would be indicated by a 90% credible interval that did not overlap the empirically observed 225 prevalence. All analyses were done using R version 3.4.3 [63] . 226 To examine viral diversity across the levels of host individual and population, we 228 compared a, b, and g diversity among host species and between host sexes using 900 host-by-229 OTU matrices drawn from the community occupancy model's posterior distribution of the true 230 underlying infection state, zi ,j. a diversity was calculated as the average viral richness per 231 individual bat across 1) all species for each sex, 2) both sexes for each species, and 3) each 232 species-by-sex combination (male A. jamaicensis, female A. jamaicensis, male P. quadridens, 233 etc.). g diversity was defined as the total viral richness for 1-3. b diversity was calculated using 234 both the multiplicative (g = a * b; [64]) and additive (g = a + b; [65]) models. 235 For a diversity, the general statistical design is that of a two-factor analysis of variance 236 (ANOVA) with replication, with individual bats representing replicates. To test the significance 237 of results, we calculated F-statistics for differences in a diversity associated with 1-3. We then 238 compared empirical F-statistics for each of the 900 posterior draws to three null distributions (H0: 239 no difference in a diversity between host sexes or among host species). Each null distribution 240 was generated using a different assumptions regarding the relationship between relative 241 abundances of each sex-by-species group, as compared to the true proportion of individuals in 242 that group. Details describing generation of the null distribution are available in the Appendix. 243 For b and g diversity, the general statistical design is that of a two-factor ANOVA 244 without replication, with treatment groups representing replicates. As such, values were only 245 compared between sexes and among species (1 and 2). We could not assess their interaction. 246 Simulation methods were identical to those for the a-level analyses (Appendix). All analyses 247 were done using R version 3.4.3 [63]. R code used for the simulation methods and calculation of 248 F-statistics is available at github.com/ARSjodin/AlphaBetaGamma. Table 2 ). 254 Observed prevalence was more likely to be underpredicted than overpredicted, indicating that the 255 model may be more pessimistic about detection probability than observed in the real dataset. 256 Model fits indicated OTUs were imperfectly detected at all stages of laboratory analysis. 258 The cPCR assay had a 0.778 probability (passay, 95CrI = 0.744-0.807) of detecting a herpesvirus 259 given the presence of at least one OTU (Table 1) . Given two cPCR assays, this suggests a 0.049 260 probability (95CrI = 0.037-0.065) of incorrectly assuming a host is not infected, were detection 261 assumed to be perfect. However, even if the cPCR assay suggests a positive infection, not every 262 OTU will be detected via sequence amplification. On average across all OTUs, the probability of 263 detecting a single OTU via sequencing was 0.699 (inverse logit-transformed "# , 95CrI = 0.637-264 0.762). There was no support for a role of either sample inhibition or viral subfamily in affecting 265 viral detection rates (Table 1) . Combined across both stages of viral detection, there was both a 266 non-zero probability that cPCR assays would fail to detect a viral infection, and that sequence 267 amplification would then fail to identify the full suite of OTUs present. Consequently, the fitted 268 model estimated on average that 1.75 hosts (95CrI = 0 -5) likely held infections of at least one 269 OTU that were not detected by cPCR, and that the mean number of OTUs missed per host was 270 0.036 (95CrI = 0.017-0.063). Across the tested host community (n = 41), this combined to an 271 expected undetected total of 12.2 OTU infections (95CrI = 5.5-21.0). 272 In general, herpesvirus occupancy was low in Puerto Rican bats, but the model uncovered 274 robust support for the importance of individual-level traits as drivers of viral richness (Table 1) . 275 Herpesvirus richness was higher in reproductively active bats compared to non-reproductive 276 bats. Specifically, reproductively active female bats had higher viral richness as compared to 277 both non-reproductive bats and to reproductively-active male bats. No other host traits 278 significantly affected the probability of viral occurrence. 279 Using model estimates of true occurrence correcting for imperfect detection, bats showed 280 differing degrees of herpesvirus diversity across community scales. a diversity was significantly 281 different between the sexes, but this effect differed by species, with only some bat species 282 showing sex-specific differences in a diversity (Figure 2, Appendix Table 3 ). Despite strong 283 differences in a diversity by species and sex, this did not scale up to g-level differences ( Figure 284 2, Appendix Table 3 ). Taken together, this means that although certain bat species (and sexes 285 within those species) host more OTUs than do other species, the identity of the OTUs is quite 286 similar across individuals of a species, indicating low OTU turnover at the population level. 287 Indeed, b diversity did not differ significantly across species (Figure 2, Appendix Table 3) . 288 However, the multiplicative form of b diversity (i.e. g = a * b) was significantly different 289 between sexes, with higher b diversity, and therefore higher turnover of OTUs, in females than 290 males (Figure 2, Appendix Table 3 ). Because no g-level difference existed between bat sexes, 291 and the sex differences at the a level depended on species, the differences in b diversity between 292 sexes is likely driven by a-level effects of certain species. 293 Scientists and public health officials have called for a shift from post-emergence, reactive 295 viral studies to predictive, preventative approaches [66, 67] . However, the integration of disease 296 ecology and ecological forecasting is still in its infancy and is constrained in utility [67, 68, 69, 297 70, 71, 72, 73 ]. An ongoing hindrance to progress is data latency, or the lag time between data 298 collection and data availability for analyses [72] . In the context of understanding viral shedding 299 in wildlife, this includes the time spent capturing and sampling hosts in the field, transporting 300 samples from the field to the laboratory, extracting genetic material and diagnosing infection, 301 and subsequent sequencing and bioinformatics. By linking increased viral shedding to more 302 easily and quickly attainable host attributes such as reproduction and the interaction between sex 303 and reproductive state, our study suggests that the use of host traits may serve as a proxy for 304 transmission risk, providing a potential solution to the data latency problem. For example, in 305 many bat species, including those tested here [37], reproductive status is a trait that is predictable 306 based on season, making it a more useful proxy in forecasting [74] . Assuming this relationship 307 holds for other virus-host systems, when scientists make forecasts about the risk posed by bats to 308 human health, reproductive status could potentially be added to models and updated temporally 309 At the level of individual hosts (a), host sex and species interacted to significantly affect 327 viral diversity (Table 2 ). This suggests that the difference in a diversity among the host species 328 is different for males and females of each species. This could be a result of an antagonistic 329 relationship between host species and sex (e.g. males of species one have higher viral diversity 330 than females of species one, but males of species two have lower viral diversity than females of 331 species two). Alternatively, this could be a result of differing magnitude of viral diversity among 332 host species and sex (e.g. males of species one have much higher viral diversity than females of 333 species one, but males of species two have only slightly higher viral diversity than females of 334 species two). Values of a diversity (Figure 2, Appendix Table 3 ) suggest that the differences in 335 viral richness between sexes generally differ in magnitude among host species, and the impact of 336 sex is not antagonistic among species. 337 Host sex, alone, had a significant effect on b diversity (using the multiplicative model), 338 with female bats hosting more compositionally heterogeneous infracommunities (i.e. higher b 339 diversity; Table 2, Figure 2 ). This was contrary to our hypothesis and shows that, despite limiting 340 range size and repeated exposure to the same individual conspecifics, the viral metacommunity 341 of female bats does not become homogenized compositionally. Instead, compositional 342 heterogeneity could be a result of strong physiological differences between immune-competence 343 of the sexes, especially during reproduction. Immune suppression of reproductively active 344 female bats may activate previously latent herpesviruses, resulting in viral infracommunities that 345 comprise OTUs acquired at any point in time. However, this could not be tested in our study. If 346 supported, it would suggest that viral composition in female bats may not be as temporally 347 constrained as in males, resulting in a more compositionally heterogeneous metacommunity. 348 Notably, differences in b diversity related to host sex are shaped strongly by two host 349 species, P. quadridens and P. portoricensis (Appendix Table 3 ). Unfortunately, little is known 350 about behavioral or physiological differences between the sexes of each of these species, that 351 may contribute to viral sharing in a manner that is different from that of other species. 352 Investigations of the mechanistic relationship between host physiology and viral infection is a 353 fruitful area for further research. Additionally, this difference in viral b diversity between male 354 and female P. quadridens is due primarily to differences in a diversity between the sexes, as g 355 diversity of male and female P. quadridens are indistinguishable (Appendix Table 3 ). For P. 356 portoricensis, the relationship is reversed. These results further emphasize the need to explicitly 357 incorporate both individual-and population-level heterogeneity in analyses of viral ecology. 358 In terms of viral detection, neither sample inhibition nor OTU subfamily influenced 359 overall viral detection rates (Table 1 ). The near-zero effect size of sample inhibition is important 360 because it demonstrates that samples do not contain contamination or non-viral genetic material 361 that prohibited detection of herpesviruses. Additionally, the near-zero effect size of OTU 362 subfamily shows that the assay is not biased towards either the Beta-or Gammaherpesvirinae. 363 Notably, no alphaherpesviruses were detected in the samples [11] , so the results make no 364 suggestions about the assay's efficiency in detecting viruses belonging to this subfamily. Still, 365 detection was imperfect, and reasons driving failed detection are yet to be determined. One 366 unmeasured factor that may influence the probability of detection in this system is viral load, or 367 the intensity of viral infection (i.e. viral "abundance" in each host). Indeed, infection intensity is 368 a significant driver of detection probability in other disease systems [24, 25] . In the context of 369 the present study, however, quantifying viral intensity in each sample would require 370 development of separate quantitative PCR assays for each OTU and was beyond the scope of this 371 Failed detection in this system can occur at the stage of sample collection (i.e. the host is 373 infected, but the virus is not in the sample) and at the stage of molecular analyses (i.e. the virus is 374 in the sample, but the laboratory methods did not detect it). Reasons could be biological (e.g. low 375 viral load associated with asymptomatic hosts) or methodological, if tests fail because of genetic 376 variation in the primer-binding sites or some other mechanism, such as failure to capture low-377 level sequences in the cloning step. The methods used here only account for failed detection at 378 the stage of molecular analyses -as only one sample was tested per host -thus failed detection 379 at the stage of sample collection will be perceived here as a lack of occurrence. However, our 380 modeling method could easily be expanded to accommodate the former, additional source of 381 heterogeneity, by taking and testing multiple samples per host. 382 Our study illustrates how viral community ecology can be used to identify risk factors for 383 infection while quantitatively accounting for imperfect detection, serving as a model for studies 384 of viral communities moving forward. Community-level occupancy modeling can be applied to 385 continued viral discovery efforts, molecular biology more broadly, and to applications of 386 approaches or concepts from community ecology to virus systems. For the herpesvirus assay 387 used in this study [51], the model indicated that two assay runs were able to detect approximately 388 Research. This project also benefited from support from the USAID PREDICT project. Figure 1 : Depiction of molecular methods informing use of hierarchical community-level 743 occupancy modeling. cPCR assays were run twice (a1 and a2). During assay 1, all gel bands 744 were cut (b1), cloned (c1), and sequenced (d1). This resulted in OTU-specific presence-absence 745 matrices that informed the response variable yi ,j,k , presence of OTU i in individual j during assay 746 k, for assay one (e1, green shading). These OTU-specific infections were then condensed to 747 general herpesvirus (i.e. non-OTU-specific) presence-absence to inform the response variable Yj ,k, 748 presence of herpesvirus in individual j during assay k (e1, purple shading). For assay 2, only 749 those gel bands that did not exist in b1 (e.g. Host 5, b2) were cut, cloned (c2) and sequenced 750 (d2). This informed yi ,j,k (e2, green shading) and Yj ,k (e2 purple shading) for assay 2. Bands that 751 existed in both assays (e.g. Hosts 1 and 3) were not cut, cloned, or sequenced during assay 2, and 752 therefore OTU identity was not known. Consequently, for assay 2, these data only informed Yj ,k 753 (e2 purple shading). Both response variables for both assays (f) were used in the modeling 754 framework. 755 Figure 2 : a, b, and g diversity by sex (black and grey shading), species (color), and species-by-756 sex groups. Rows represent diversity metrics (bM signifies the multiplicative model, and bA 757 signifies the additive model). Error bars indicate posterior uncertainty in metrics calculated using 758 900 z-matrices. P-values represent significance when comparing empirical metrics to a null 759 distribution using the two-step null generation method (see Appendix). P-values associated with 760 comparisons using two additional null distributions, which do not show substantive differences, 761 are included in Appendix Table 3 . R2jags: using R to run 'JAGS Posterior predictive assessment of model fit via realized 658 discrepancies Core Team. 2017 R: A language and environment for statistical computing. R Foundation 661 for Statistical Computing Vegetation of the Siskiyou Mountains On the relation between habitat selection and species 667 diversity An ecological and conservation perspective on 670 advances in the applied virology of zoonoses Predicting virus emergence amid evolutionary noise Viral host jumps: moving toward a predictive framework Data-model fusion to better 680 understand emerging pathogens and improve infectious disease forecasting Forecasting epidemiological and 684 evolutionary dynamics of infectious diseases Epidemic forecasting is messier than weather forecasting: the role of 689 human behavior and internet data streams in epidemic forecast Iterative near-term ecological 694 forecasting: needs, opportunities, and challenges On the predictability of infectious disease outbreaks Viruses in the sea Astrovirus infections in 705 humans and animals -molecular biology, genetic diversity, and interspecies 706 transmissions When bats go viral: negative 709 framings in virological research imperil bat conservation Ecological and 714 anthropogenic drivers of rabies exposure in vampire bats: implications for transmission 715 and control Marburgvirus resurgence in Kitaka mine 719 bat population after extermination attempts To cull, or not to cull, bat is the question Conservation Ecology of Bats Biological correlates of extinction risk in bats Population ecology of the gray bat (Myotis grisescens): factors influencing 732 early growth and development Designing occupancy studies: general advice and allocating 735 survey effort Model code: 815 816The following represents statistical code written in the JAGS modeling language for 817 fitting the multi-species occupancy model of viral community occurrence. To create the null distribution, all empirical viral richness values from all sampled bats 901were placed into a pool (Appendix Figure 1) . Richness values from individual bats were then 902 selected at random, and without replacement, from the pool and randomly assigned a host 903 treatment. Random draws continued until simulated treatment groups contained the same number 904of bats (N) as do empirical treatment groups. 905This simulation method assumes that the empirical relative abundances of each host 906 treatment group represent the true proportion of individuals of that treatment. We evaluated the 907 robustness of results under this assumption by expanding the methodological approach in two 908ways. First, we repeated the above methods, but sampled from the regional pool with 909replacement. Second, we adopted a two-part simulation in which step one included random 910 selection of a host species by sex treatment as a "donor", such that each treatment combination 911had the same chance of providing hosts to the simulated data set. In step two, we randomly drew 912 an infracommunity from the previously selected donor treatment and placed it into a randomly 913 selected "recipient" treatment. The donor treatments were sampled with replacement, and the 914 two-step simulation was repeated until the number of infracommunities in each recipient 915 treatment equaled empirical numbers. 916 917 918Appendix