key: cord-0309538-ylis52ut authors: Gilson Silva, A. V. F.; Menezes, D.; Moreira, F. R. R.; Torres, O. A.; Fonseca, P. L. C.; Moreira, R. G.; Alves, H. J.; Alves, V. R.; Amaral, T. M. d. R.; Coelho, A. N.; Saraiva-Duarte, J. M.; da Rocha, A. V.; de Almeida, L. G. P.; de Araujo, J. L. F.; de Oliveira, H. S.; de Oliveira, N. J. C.; de Sa, C. Z.; de Souza, J. H.; de Souza, E. G.; de Souza, R. M.; Ferreira, L. d. L.; Gerber, A. L.; Guimaraes, A. P. d. C.; Maia, P. H. S.; Marim, F. M.; Miguita, L.; Monteiro, C. C.; Saliba Neto, T.; Pugedo, F. S. F.; Queiroz, D. C.; Cuzzuol Queiroz, D. N. A.; Resende-Moreira, L. C.; Santos, F. M.; Sou, title: Seroprevalence, prevalence, and genomic surveillance: monitoring the initial phases of the SARS-CoV-2 pandemic in Betim, Brazil date: 2021-10-26 journal: nan DOI: 10.1101/2021.10.21.21265140 sha: 27fe9cc23849354dcd93679368dabcf7e3574603 doc_id: 309538 cord_uid: ylis52ut The Covid-19 pandemic has created an unprecedented need for epidemiological monitoring using diverse strategies. We conducted a project combining prevalence, seroprevalence, and genomic surveillance approaches to describe the initial pandemic stages in Betim City, Brazil. We collected 3239 subjects in a population-based age-, sex- and neighborhood-stratified, household, prospective; cross-sectional study divided into three surveys 21 days apart sampling the same geographical area. In the first survey, overall prevalence (participants positive in serological or molecular tests) reached 0.46% (90% CI 0.12% - 0.80%), followed by 2.69% (90% CI 1.88% - 3.49%) in the second survey and 6.67% (90% CI 5.42% - 7.92%) in the third. The underreporting reached 11, 19.6, and 20.4 times in each survey, respectively. We observed increased odds to test positive in females compared to males (OR 1.88 95% CI 1.25 - 2.82), while the single best predictor for positivity was ageusia/ anosmia (OR 8.12, 95% CI 4.72 - 13.98). Thirty-five SARS-CoV-2 genomes were sequenced, of which 18 were classified as lineage B.1.1.28, while 17 were B.1.1.33. Multiple independent viral introductions were observed. Integration of multiple epidemiological strategies was able to describe Covid-19 dispersion in the city adequately. Presented results have helped local government authorities to guide pandemic management. Since its emergence in December 2019, the new human coronavirus has had a tremendous impact on humanity due to the pandemic nature of its infection, called Covid-19 [1] . The SARS-CoV-2 pathogen was described on January 24, 2020. In Brazil, the first case of Covid-19 was reported on February 26, 2020, in the city of São Paulo [2] . The virus spread rapidly, and the country had the highest number of cases and deaths in Latin America, experiencing its first peak wave in late July 2020. Although most cases were identified in the most prominent Brazilian cities, São Paulo and Rio de Janeiro, dispersion to other municipalities were quickly reported. Betim, a town located in the Minas Gerais State in Brazil with an estimated population of 439,340 in 2019, had its first reported SARS-CoV-2 case on March 23, 2020, in two patients returning from Europe. Two months later, on May 23, 2020, only 73 confirmed cases had been reported, although 4380 suspected cases were identified in public databases indicating limited testing availability. Brazilian public healthcare system has prioritized testing subjects with symptoms due to scarce diagnostic tests, particularly in the early days of the pandemic. Since data suggest that symptomatic cases represent a fraction of persons infected with SARS-CoV-2, official statistics were expected to be underestimated [3] . Epidemiological surveillance using prevalence studies is needed to evaluate the true extent of SARS-CoV-2 dispersion, significantly extending testing to asymptomatic subjects. Combining serological and molecular tests may be a more robust strategy to uncover viral diffusion in a territory, avoiding each test's kinetic detection limitations. Valid prevalence and seroprevalence estimates for a population rely on two major factors: (i) a representative population sample and (ii) accurate diagnostic testing [4] . While the epidemiological investigation is essential for controlling Covid-19, genomic surveillance is equally crucial. Robust SARS-CoV-2 variant monitoring can track viral evolution, detect new variants, describe patterns and clusters of transmission, outbreak tracking, among others. Therefore, it can provide actionable information on implementing a more targeted public health strategy that addresses local priorities through stakeholder engagement and mitigation efforts [5] . Here, we conducted a study combining seroprevalence, prevalence, and genomic surveillance approaches to understand the SARS-CoV-2 epidemic spread in Betim city. The Research Ethics Committee approved the present experiment under protocol CAAE 31459220.2.0000.5651. We conducted a population-based age-, sex-and neighbourhood-stratified, household, prospective; cross-sectional study repeated every 21 days in the same geographic area to determine the extent of SARS-CoV-2 transmission in Betim, Minas Gerais, Brazil. Three surveys were held: June 3-5, June 23-25, and July [13] [14] [15] 2020 . The sample size (n = 1,080 each survey) was estimated considering dichotomous outcome (positive or negative), the population of 439,340 inhabitants, the confidence level of 90%, the maximum margin of error of 2.5%, and lack of a priori information on the prevalence of SARS-COV-2 in the municipality's population (the latter represented by p = q = 0.5) and using the equation below: Random sampling was employed to ensure representativeness of the population, stratified by sex, age (0 to 5; 6 to 19; 20 to 39; 40 to 59 and 60 years or older) and city neighbourhoods (Centro, Alterosas, Imbiruçu, Norte, Teresópolis, PTB, Citrolândia, Vianópolis, Icaivera, and Petrovale). Every census tract (population stratum created by Governmental agencies) was sampled with at least one address. In case of refusal or closed households, the closest home was selected. Thirty-six teams (one driver, one nurse, and one community health worker) worked on active sampling subjects in 1080 addresses during three days. Clinical and epidemiological data were obtained using a questionnaire during interviews with participants or their legal guardians who signed the Informed Consent. Biological samples were collected using a nasal swab to conduct RT-PCR and capillary blood obtained by fingerstick for the serological test. RT-PCR to detect SARS-CoV-2 RNA was initially conducted in pools of ten samples [6] . Whenever pools were positive, individual samples were examined. Real-Time RT-PCR Diagnostic Panel (N1, N2 and RNP primers). Serological tests were conducted using the SARS-CoV-2 Antibody Test (Guangzhou Wondfo Biotech Co., Ltd.) that detects IgM/IgG antibodies. The same test was used in a previous study in Brazil [7] . Reported sensitivity is 86.43% (95% CI: 82.41% ~ 89.58%) and specificity 99.57% (95% CI: 97.63% ~ 99.92%). We have validated antibody tests using serum samples from subjects who were SARS-CoV-2 positive confirmed with RT-PCR. Associations of each variable of interest with surveys (Table 1 ) and positive status (Table 2) were assessed using chi-square tests. Odds ratios were estimated using logistic regression with the glm function. Spatial geostatistical modelling and prediction were carried out using the gstat and predict functions from the gstat package. All analyses were carried out in R software (version 4.1.1). Whole viral genome amplification and DNA library preparation was carried out as described elsewhere [8] . Briefly, QIAseq SARS-CoV-2 Primer Panel -QIAGEN kit was used to amplify positive samples, following manufacturer instructions. In total, 39 of the 84 detectable samples were eligible for library preparation based on their CTs ≤ 30. Library concentration was measured using the QIAseq Library Quant Assay -QIAGEN kit, and the fragment integrity and size were evaluated using Bioanalyzer (Agilent Technologies, Waldbronn, DE). Sequencing was carried out on a MiSeq (Illumina, San Diego, CA, USA). The raw data generated were filtered by Trimmomatic v0.39 [9] , which trimmed low-quality bases (Phred score < 30) and removed short reads (<50 nucleotides) as well as adapters and primer sequences. Reads were then mapped against the SARS-CoV-2 reference genome (accession number: NC_045512.2) with Bowtie2 [10] . The resulting BAM files were manipulated with SAMtools, BCFtools [11] , and BEDtools [12] to generate consensus genome sequences. Bases with less than 10x sequencing depth were masked. In total, 35 of the 39 genome sequences presented coverage greater than 79% and average sequencing depth greater than 200x. Sequencing metadata is available in Table S1 . The 35 consensus genome sequences were submitted to the PANGOLIN 2.0 lineage classification tool (database version February 2, 2021) [13] . To confirm the PANGOLIN identification and further contextualize the diversity of lineages circulating in Betim, we performed a set of phylogenetic analyses. First, a global dataset was assembled from a subset of high-quality data available on GISAID and the newly generated genomes (n = 3,814). This dataset contained all Brazilian sequences and one per week for each country, as available on GISAID until January 12, 2021. These sequences were aligned with MAFFT v7.475 [14] , and a maximum likelihood tree was All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.21.21265140 doi: medRxiv preprint inferred on IQ-Tree 2 [15] , under the GTR+F+I+G4 model [16] , [17] . Shimoidara-Hasegawa approximate likelihood ratio test (SH-aLRT) was used to assess branches' statistical support [18] . Two subsets of the previous dataset were assembled to explore the temporal Maximum likelihood trees were inferred from these datasets, and their temporal signal was evaluated with tempest v1.5.3 [19] . Time scaled phylogenies were then inferred from these datasets with BEAST v1.10.4 [20] , using: (i) the HKY+I+G4 nucleotide substitution model [17] , (ii) the strict molecular clock model, (iii) the nonparametric coalescent skygrid tree prior [21] and (iv) a symmetric discrete phylogeographic model [22] . A normal prior distribution (mean = 1.13 x 10⁻³; std= 5.1 x 10⁻⁴) on clock rate was assumed, based on a previous estimate [23] . The cutoff values of the skygrid tree prior were set based on the previously estimated dates for the emergence of each lineage [23] . The number of grids of the tree priors was set to match the approximate number of weeks comprehended between the estimated dates for lineages' emergence and the dates of the most recently sampled sequences (41 weeks, both datasets). Two and three independent chains of 200 million generations sampling every 10,000 states were performed for datasets B. Table 1 presents clinical and epidemiological data obtained from participants. No significant difference was observed for the presence of any prior health condition across surveys (pneumopathy, chronic neurological disease, pregnant, postpartum, chronic cardiovascular disease, chronic kidney disease, obesity, asthma, immunodepression, chronic liver disease, diabetes, hypertension, transplanted, cancer or any comorbidity) indicating proper sampling was conducted since there was no reason to find significant differences in the period. Four symptoms (cough, sore throat, myalgia, and rhinorrhea) and contact with a symptomatic person increased while international travel decreased. Prevalence and seroprevalence increased across surveys. Pandemic progression in Betim city is presented in Figure 1 . Confirmed cases underestimation was found in all three surveys. In the first survey, overall prevalence (participants positive in serological or molecular tests) reached 0.46% (90% CI 0.12% -0.80%), followed by 2.69% (90% CI 1.88% -3.49%) in the second survey and 6.67% (90% CI 5.42% -7.92%) in the third. The underreporting was obtained by the difference between survey prevalence and official data, and its magnitude reached 11, 19.6, and 20.4 times (distance between black dots and red curve in Figure 1B ). Active transmission areas (RT-PCR positive participants) were observed increasing across time ( Figure 1C -E). By the third survey, almost all populated city areas were likely to have viral circulation ( Figure 1E ). The same pattern of increase was observed in overall prevalence for most administrative regions ( Figure 1F -G). We have also evaluated whether clinical and epidemiological variables were associated with molecular or serological test positivity ( Table 2) . Several significant results were observed, mostly with reported symptoms (fever, cough, sore throat, dyspnoea, myalgia, rhinorrhea, respiratory discomfort, nausea/ vomit, headache, prostration, ageusia/ anosmia). We also observed increased odds to test positive in females compared to males (OR 1.88 95% CI 1.25 -2.82) and clear enrichment of positive cases in certain city regions (e.g., Imbiruçu and Terezópolis). Surprisingly, people with obesity were more likely to be positive (OR 3.33, 95% CI 1.68 -6.59). The single best predictor for positivity was ageusia/ anosmia (OR 8.12, 95% CI 4.72 -13.98). Non-significant results can be found in Table S2 . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. In total, 35 novel SARS-CoV-2 genome sequences were obtained (GISAID EPI_ISL_5416087-5416121). The sequences were classified by PANGOLIN 2.0 to assess the genetic diversity of SARS-CoV-2 circulating in Betim. 18 another Brazilian region to Betim, in addition to other single events from RJ, another one from SP, and another from foreign sequences. All events presented high statistical support (PP > 92% for all events). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Betim is a medium-sized Brazilian city (439,340 inhabitants, 343 thousand square kilometres) crossed by national roads connecting major Brazilian cities and serving as a local hub for the Brazilian Public Health System. Understanding its pandemic dynamic may provide relevant information for municipalities with similar features. Here, we estimated the overall prevalence of active infections, seroprevalence and conducted genomic surveillance before the first pandemic wave in Betim. Brazilian molecular diagnostic capacity was insufficient in the first months of the pandemic [26] . Therefore, Covid-19 cases may have been included in the official statistics as severe acute respiratory infection cases with unknown aetiology. Data until May 2020 indicated a positive association between higher per-capita income and molecular Covid-19 diagnosis, while the severe acute respiratory infection cases with unknown aetiology were associated with lower per-capita income, suggesting a possible diagnosis bias related to economic status [27] . Inadequate diagnosis availability may lead to underreporting [28] . Our data estimated underreporting rates up to 20 times. No studies have been conducted in Brazil evaluating active infection prevalence using adequate sampling. Our study design was inspired by previous research conducted in Santa Clara, USA, using pooled samples [29] . Pooled PCR tests were initially suggested to be used in asymptomatic people [6] and later were recommended for surveillance studies in populations with low infection prevalence [30] . Seroprevalence studies were conducted during the first wave in Brazil that peaked in July 2020. Two of the highest city seroprevalences reported during the period were Boa Vista (25.4% in June 2020) [7] and São Luiz (40.4% between the end of July and August 2020) [31] , both in the northern area of the country. A nationwide survey carried out in May and June 2020 presented seroprevalence lower than two per cent during both surveys in all sampled cities neighbouring Betim (less than 200km), corroborating our findings [7] . Furthermore, seroprevalences higher than ten per cent were solely found in towns in the North Region [7] . In December 2020, Manaus, the largest city in the North Region, experienced a resurgence of Covid-19 [32] despite high seroprevalence [33] , likely due to the gamma variant [34] . Previous seroprevalence studies have indicated ethnic and socioeconomic bias for SARS-CoV-2 infection in Brazil since the pandemic's beginning [35] , [36] . Results from Rio de Janeiro in April 2020 indicated that younger blood donors with lower education All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.21.21265140 doi: medRxiv preprint levels were more likely to test positive for SARS-CoV-2 antibodies [35] . A nationwide study revealed that the poorest quintile was 2.16 times more likely to test positive with the lowest risks among white, educated, and wealthy individuals [36] . Likewise, we found one of the highest prevalences in the poorest neighbourhood, Terezópolis, that include the largest slum of the city where more than 23 thousand people live. Further modelling results showed higher infection rates among young adults, lower socioeconomic status, and people without healthcare access in the less developed North and Northeast areas until August 2020 [37] . Betim also presents most of its inhabitants with less than 59 years (90.7%), but no age effect was observed in the infectivity rates. Increased female infection odds were observed, although previous reports indicated a gender predisposition towards death in some Brazilian regions with higher male risk [38] . One possible explanation could be that 70% of the global health workforce are women [39] and a gender bias of pandemic perception and attitude [40] . Covid-19 diffusion presents strong socio-spatial determinants. Relocation diffusion from more-to less-developed regions and hierarchical diffusion from countries with higher population and density were relevant since early 2020 [41] . Data indicated a similar pattern in the São Paulo State with contiguous diffusion from the capital metropolitan area and hierarchical with long-distance spread through major highways that connects São Paulo city with cities of regional relevance [42] . Modelling results revealed that São Paulo city may have accounted for more than 85% of the initial case spread in the entire country [43] . Betim is directly connected to São Paulo city by a main national highway which may have contributed to Covid-19 diffusion. Genomic surveillance is a powerful tool to elucidate viral dispersion patterns. The first sequencing work conducted in Brazil evaluated the first six positive individuals and reported the same predominant lineages found in Italy [44] . Later, a study with samples collected until late April 2020 from different country areas showed the dominance of clade B-derived lineages. At the national level, the respective frequency of these clades was seen in a 98.98%/1.02% ratio [23] . In Minas Gerais State, A lineages represented 2.5% of the infections, B.1 appeared in 92.5% of the samples, and B was responsible for 5% of the cases [45] . The exclusivity of lineages B.1.1.28 and B.1.1.33 circulating in Betim-MG from June to July 2020, given that multiple introductions from different country regions were demonstrated, is representative of the extent of these lineages' dominance in the Brazilian scenario at the moment. Independent introductions also All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. We want to thank nurses, community health workers, drivers and management personnel who collaborated in this project. We also thank Mr. Guilherme Carvalho da Paixão for his support. We gratefully acknowledge the authors from the originating laboratories responsible for obtaining the specimens and the submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based (Table S3) . All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. and, more substantially, from the second to the third survey. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 26, 2021. ; Figure 2 : Phylogenetic characterization of SARS-CoV-2 genomes characterized in Betim. A maximumlikelihood tree was inferred on IQ-Tree under the GTR+F+I+G4 model with a comprehensive reference dataset, encompassing all Brazilian sequences plus one international sequence per country per week, from late 2019 to January 12 2021 (n = 3,814). The phylogeny depicted exhibits a subtree of 2,023 tips that harbours all relevant diversity considered for this study, mainly lineages B.1.1.28 (light salmon) and B.1.1.33 (light blue) where the novel genome sequences sparsely clustered. Tip shapes mark sequences characterized in this study. The scale bar indicates average nucleotide substitutions per site. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.21.21265140 doi: medRxiv preprint Table S3 : Acknowledgement to sequences obtained from GSAID. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. A pneumonia outbreak associated with a new coronavirus of probable bat origin SARS-CoV-2 isolation from the first reported patients in brazil and establishment of a coordinated task network Substantial underestimation of SARS-CoV-2 infection in the United States Comparison of seroprevalence of SARS-CoV-2 infections with cumulative and imputed COVID-19 cases: Systematic review Genomic surveillance to combat COVID-19: challenges and opportunities Pooling of samples for testing for SARS-CoV-2 in asymptomatic people SARS-CoV-2 antibody prevalence in Brazil: results from two successive nationwide serological household surveys Epidemic spread of sars-cov-2 lineage b.1.1.7 in Brazil Trimmomatic: A flexible trimmer for Illumina sequence data Ultrafast and memoryefficient alignment of short DNA sequences to the human genome The Sequence Alignment / Map (SAM) Format and SAMtools 1000 BEDTools: A flexible suite of utilities for comparing genomic features A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology MAFFT multiple sequence alignment software version 7: Improvements in performance and usability IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era Some probabilistic and statistical problems in the analysis of DNA sequences Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods New algorithms and methods to estimate maximum-likelihood phylogenies: Assessing the performance of PhyML 3.0 Exploring the All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Improving bayesian population dynamics inference: A coalescent-based model for multiple loci Bayesian phylogeography finds its roots Evolution and epidemic spread of SARS-CoV-2 in Brazil Posterior summarization in Bayesian phylogenetics using Tracer 1.7 GISAID: Global initiative on sharing all influenza data -from vision to reality Increasing molecular diagnostic capacity and COVID-19 incidence in Brazil Epidemiological and clinical characteristics of the COVID-19 epidemic in Brazil How many more? Under-reporting of the COVID-19 deaths in Brazil All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity Sample Pooling as a Strategy to Detect Community Transmission of SARS-CoV-2 A pooled testing strategy for identifying SARS-CoV-2 at low prevalence Population-based seroprevalence of SARS-CoV-2 and the herd immunity threshold in Maranhão Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil Seroprevalence of anti-SARS-CoV-2 among blood donors in Rio de Janeiro, Brazil Prevalence of antibodies against SARS-CoV-2 according to socioeconomic and ethnic status in a nationwide Brazilian survey Spatial pattern of COVID-19 deaths and infections in small areas of Brazil Ethnic and regional variations in hospital mortality from COVID-19 in Brazil: a crosssectional observational study Gender, race, and health workers in the COVID-19 pandemic Gender differences in COVID-19 attitudes and behavior: Panel evidence from eight countries The socio-spatial determinants of COVID-19 diffusion: the impact of globalization, settlement characteristics and population The use of health geography modeling to understand early dispersion of COVID-19 in São Paulo, Brazil The impact of super-spreader cities, highways, and intensive care availability in the early stages of the COVID-19 epidemic in Brazil Importation and early local transmission of covid-19 in brazil The ongoing COVID-19 epidemic in Minas Gerais, Brazil: insights from epidemiological data and SARS-CoV-2 whole genome sequencing