key: cord-0287738-wghk8xbz authors: Olm, Matthew R.; Dahan, Dylan; Carter, Matthew M.; Merrill, Bryan D.; Yu, Brian; Jain, Sunit; Meng, Xian Dong; Tripathi, Surya; Wastyk, Hannah; Neff, Norma; Holmes, Susan; Sonnenburg, Erica D.; Jha, Aashish R.; Sonnenburg, Justin L. title: Robust Variation in Infant Gut Microbiome Assembly Across a Spectrum of Lifestyles date: 2022-04-02 journal: bioRxiv DOI: 10.1101/2022.03.30.486467 sha: 1a40c13bd4d8ef8c5b8244c2bad110fb8b7beead doc_id: 287738 cord_uid: wghk8xbz Infant microbiome assembly is intensely studied in infants from industrialized nations, but little is known about this process in populations living non-industrialized lifestyles. In this study we deeply sequenced infant stool samples from the Hadza hunter-gatherers of Tanzania and analyzed them in a global meta-analysis. Infant microbiomes develop along lifestyle-associated trajectories, with over twenty percent of genomes detected in the Hadza infant gut representing phylogenetically diverse novel species. Industrialized infants, even those who are breastfed, have microbiomes characterized by a paucity of Bifidobacterium infantis and gene cassettes involved in human milk utilization. Strains within lifestyle-associated taxonomic groups are shared between mother-infant dyads, consistent with early-life inheritance of lifestyle-shaped microbiomes. The population-specific differences in infant microbiome composition and function underscore the importance of studying microbiomes from people outside of wealthy, industrialized nations. Recognition of work on indigenous communities Research involving indigenous communities is needed for a variety of reasons including to ensure that scientific discoveries and understanding appropriately represent all populations and do not only benefit those living in industrialized nations. Special considerations must be made to ensure that this research is conducted ethically and in a non-exploitative manner. In this study we performed deep metagenomic sequencing on fecal samples that were collected from Hadza hunter-gatherers in 2013/2014 and were analyzed in previous publications using different methods (1, 2). A material transfer agreement with the National Institute for Medical Research in Tanzania ensures that stool samples collected are used solely for academic purposes, permission for the study was obtained from the National Institute of Medical Research (MR/53i 100/83, NIMR/HQ/R.8a/Vol.IX/1542) and the Tanzania Commission for Science and Technology, and verbal consent was obtained from the Hadza after the study’s intent and scope was described with the help of a translator. The publications that first described these samples included several scientists and Tanzanian field-guides as co-authors for the critical roles they played in sample collection, but as no new samples were collected in this study, only scientists who contributed to the analyses described here were included as co-authors in this publication. It is currently not possible for us to travel to Tanzania and present our results to the Hadza people, however we intend to do so once the conditions of the COVID-19 pandemic allow it. The human gut microbiome undergoes a complex process of assembly beginning immediately after birth (3) . New microbes attempting to engraft within this community often depend upon niches established by previous colonizing species, and thus the final adult microbiome composition may be contingent upon the species acquired early in life. The microbiome assembly process of infants living in industrialized nations is well-studied, and tends to follow a series of characterized steps that lead to the low-diversity gut microbiome composition characteristic of industrialized adults (4) . The microbiome assembly process that occurs in infants living non-industrialized lifestyles (which results in the characteristically diverse adult microbiomes of non-industrialized adults (2)) is largely unknown (5) . Of particular interest are the timing at which the microbiomes of infants from different lifestyles diverge, the microbes and functions that are characteristic of infants from different lifestyles, and whether there are differences in the taxa that are vertically transmitted from mothers to infants and seed the microbiome assembly process. To address these questions we performed metagenomic sequencing on infant fecal samples from the Hadza, a group of modern hunter-gatherers in Sub-Saharan Africa (1, 6) . The Hadza inhabit semi-nomadic bush camps of approximately 5 to 30 people and exhibit a moderate level of community child rearing within the camps (7) . Hadza infants are breastfed early in life and then weaned onto a diet that consists of baobab powder, animal fat, and pre-masticated meat at approximately 2-3 years old (8, 9) . In this study we performed i) a global meta-analysis of infant microbiome samples sequenced using 16S rRNA amplicon sequencing, including 62 Hadza infant fecal samples, in order to contextualize the Hadza infant microbiome with as many samples as possible, and ii) deep metagenomic sequencing on 39 Hadza infant fecal samples in order to assess sub-species variation, functional potential, and patterns of vertical transmission (Table S1, S2) . To assess inter-individual variation of infants across lifestyles, we curated a global dataset of 16S rRNA sequencing samples from 1,900 healthy infants aged 0-36 months from 18 populations (including the Hadza samples described above) (1, 2, 4, (10) (11) (12) (13) (14) (Fig. S1, S2 ). Populations were categorized as practicing industrialized, transitional, or non-industrialized lifestyles using the Human Development Index (HDI) and other lifestyle characteristics (15, 16) ; see methods for details. We created a UniFrac-based ordination from all 1,900 samples (Fig. 1A) and found that the main ordination axis of variation is strongly associated with age (EnvFit; n=1900; R 2 = 0.43; P = 0.001) and the second axis of variation is strongly associated with lifestyle (EnvFit; n=1900; R 2 = 0.50; P = 0.001). DNA extraction methods, or other studyspecific aspects of data generation, may contribute to some of the differences in data between studies. Comparison of two populations within the same country but with different lifestyles (the Bassa from Nigeria (non-industrialized lifestyle) and city dwelling Nigerian infants (transitional lifestyle)) demonstrate that shared lifestyle affects microbiota composition more than geographic proximity (Fig. 1A, right panel; Fig. S3 ). Similarly, the Tsimane infants in Bolivia harbor a microbiota more similar to Hadza and Bassa infants than to infants from other lifestyles in South America (Fig. S3) . The microbiome of infants living industrialized lifestyles diverges from those living transitional and non-industrialized within the first 6 months of life, and the microbiomes of infants living transitional versus non-industrialized lifestyles diverge at ~30 months of life (Fig. 1B) . A lack of complete metadata precluded us from testing whether this is due to differences in feeding practices between lifestyles. Among infants living transitional lifestyles, Members of the gut microbiome are often metabolically or ecologically linked, for example in the respective production and consumption of metabolites. We identified five microbial co-abundance groups (CAGs) in our dataset using a network inference method (17, 18) , which together account for an average of 93.8% of the microbiota composition per sample ( Fig. 1C; Fig. S4 ). The Bifidobacterium-Streptococcus CAG dominates infants from all lifestyles in early life (0-6 months) (Fig. 1C ) and yields over time in a lifestyle-specific manner. A Bacteroides-Ruminocccocus gnavus CAG is enriched in industrialized infants whereas a Prevotella-Faecalibacterium CAG is enriched in infants living transitional / non-industrialized lifestyles (Fig. 1C) . These differences in dominant CAGs by lifestyle become more pronounced over time and mirror taxonomic tradeoffs observed in late infancy (19) and compositional differences found in adult microbiomes (1). We next used our deep metagenomic sequencing data of Hadza infant fecal samples, in comparison to other available infant metagenomes, to assess microbiome-encoded functional differences across samples (see methods for details). Broad lifestyle-associated differences were seen in the overall functional capacity of the infant microbiomes, along with age-related microbiome development (Fig. 2A) . These metagenomic data are consistent with 16S rRNA amplicon-based analysis (Fig. 1A) illustrating that these phylogenetic differences have functional consequences. Hadza infant metagenomes were assembled and binned into metagenome-assembled genomes (MAGs) to investigate taxonomic novelty and sub-species variation. Of the 745 microbial species assembled from Hadza infant metagenomes, 23% (n=175) represent novel species according to UHGG (Table S7) . These novel species belong to phylogenetically diverse taxonomic groups (Fig. S5A) , 88.6% (n=155) were recovered from multiple Hadza samples (Fig. S5B ), and their genome quality is similar to genomes in UHGG (Fig. S5C) . To assess prevalence via read mapping, MAGs were integrated with genomes recovered from Hadza adults (20) and public genomes from the human gut (21) into a comprehensive database of 5,755 speciesrepresentative genomes (see methods for details). Overall, 23.4% of microbial species detected in the Hadza infants represent novel species (Table S3 ). These data support that, like the adult Hadza gut, the Hadza infant gut contains extensive previously-uncharacterized diversity. The taxonomic specificity afforded by metagenomic sequencing allowed us to identify particular species that are depleted or enriched in infants living industrialized versus nonindustrialized lifestyles. Microbial species exhibiting these patterns are termed VANISH (volatile and/or negatively associated in industrialized societies of humans) and BloSSUM (bloom or selected in societies of urbanization/modernization) species, respectively (2). 310 VANISH and 12 BloSSUM species were identified among the infants in this analysis ( (Fig. S7) . Second, VANISH species are over-represented shortly after birth whereas BloSSUM species occur primarily later in infancy. Many VANISH species (45.2%; 140/310) are present at 0-6 months in non-industrialized infants, while few BloSSUM species are detected this early in industrialized lifestyle infants (16.7%; 2/12) (Fig. 2B) . Together these patterns suggest that more species disappear than appear as lifestyles become more industrialized, and that differences in the very early life microbiome (0-6 months) may engender alternative trajectories of microbiome assembly. Amplicon and metagenomic data both show that Bifidobacterium is the most prevalent taxon in early life (Fig. 1C, Fig. 2B) , prompting us to next investigate variation in Bifidobacterium species and strains across lifestyles. In the first 6 months of life we found that non-industrial lifestyle infants are dominated by Bifidobacterium infantis (also known as Bifidobacterium longum subsp. infantis) (Fig. 2C) , a prolific utilizer of human milk oligosaccharides (HMOs), associated with human health, and commonly used in probiotic supplements (22) . Infants living transitional and industrialized lifestyles have expanded representation of other Bifidobacterium species with limited ability to degrade HMOs (Fig. 2C) . To determine whether these species-level differences result in community-wide differences in HMO degradation capacity, we mapped our metagenomic reads to the most well-characterized genetic clusters for human milk utilization ( Table S5) . Five of these clusters are involved in HMO degradation (H1 -H5) and one cluster is involved in nitrogen scavenging (referred to as the "urease" cluster) (22, 24) , and recent studies have linked their expression in the infant gut microbiome with systemic immunological health outcomes (25) . Five of the six clusters were more prevalent in non-industrialized than industrialized infants, and their prevalence among transitional infants was in between these two extremes, in all cases showing a pattern of decreasing representation as infants age (Fig. 2E) . The H5 cluster, however, is found at the same prevalence level in infants from all lifestyles in early life, but exhibits continued persistence only in infants from industrialized lifestyles (Fig. 2E) . The H5 cluster encodes an ABC-type transporter known to bind core HMO structures, and we found this cluster in 119 of 129 B.breve MAGs and 41of 69 B. infantis MAGs recovered from industrialized infants (P = 1.4E-9, Fisher's exact test). The persistence of the H5 cluster beyond 12 months in industrialized infants, a time period when breastfeeding is less common in these populations, suggests this cassette of genes exists in genomes that are not reliant upon breastfeeding. Breast milk consumption among industrialized infants reduces, but does not eliminate, lifestyle-associated differences in Bifidobacterium infantis and HMO-degradation cassette prevalence (Fig. S8 ). Beyond the species-level Bifidobacterium differences described above, we next leveraged the assembly-based metagenomic analysis performed in this study to investigate strain-level differences among B. infantis genomes recovered from infants living different lifestyles. B. infantis MAGs from infants aged 0-1 years old (n=96 MAGs) have several functional differences, including i) enrichment in non-industrialized versus industrialized infants of Glycoside Hydrolase Family 163 (GH_163), a CAZyme involved in the utilization of complex N-glycans (including those found on immunoglobulins) (26) (Fig. S9A) , ii) three Pfams that differ in prevalence in infants from different lifestyles (Fig. S9C) , and iii) the identification of four gene clusters at higher prevalence in B. infantis genomes found in the Hadza versus 9 industrialized infants (Fig. S9D) . To verify the metagenome-based findings, we isolated and sequenced 20 B. infantis strains from the same Hadza infant fecal samples (Table S7) . GH_163 and all four gene clusters showed enrichment among Hadza B. infantis isolates as compared to public reference genomes (Fig. S9) , consistent with our MAG-based findings. Finally, strong lifestyle-specific phylogenetic clustering was observed among B. infantis isolate sequences and MAGs (Fig. 2F) . This observation of strong region-specific phylogenetic signals (Fig. 2F) could be a result of long-term, multi-generational vertical transmission (27) . To assess the extent of vertical strain transmission among the Hadza, we deeply sequenced fecal samples from Hadza mothers of infants in this study (n=23 Hadza dyads). Detailed strain-tracking analysis was performed using inStrain (28) with a threshold for identical strains of 99.999% popANI (Table S6) . Dyad pairs share far more strains on average than nondyad pairs (6.4 vs 0.3 strains, respectively), and a higher percentage of infant strains are shared between an infant and their respective mother versus with another mother (12.4% vs 0.5%, respectively) (p < 0.001, Wilcoxon rank-sum test) (Fig. 3A) . Remarkably, Hadza non-dyads living in the same bush camp also share more strains and a higher proportion of strains than those living in different bush camps (Fig. 3A) (p < 0.001, Wilcoxon rank-sum test), consistent with previously-reported increased rates of strain sharing within Fijian social networks. (29) . Vertically shared strains were detected among a range of different phyla in the Hadza, with Bacteroidota and Cyanobacteria having more vertically shared strains than other phyla, and Firmicutes having less (Fig. 3B ; Fisher's exact test with false discovery rate correction). Industrialized infants also exhibit significantly increased and decreased vertical strain sharing of Bacteroidetes and Firmicutes, respectively (30) . Together these results suggest that community interaction during rearing of infants and bush camp microenvironments such as water source may play important roles in increasing transmission of strains to infants, and is consistent with proximity and social interactions propagating group-microbial sharing (31). To address whether lifestyle-dependent divergence of infant microbiotas could be explained by strain sharing between mothers and their infants, we performed the same detailed strain-tracking analysis on a previously sequenced dataset of 100 mother-infant dyads from Sweden (32) . Swedish infants born via C-section were excluded from this analysis (n=17 11 eliminated; n=83 included). Infants in Swedish and Hadza dyads had average ages of 1.01 ± 0.00 and 0.95 ± 0.21 years old, respectively (P=0.04, Wilcoxon rank-sum test); one difference between these studies is that Swedish mothers were sampled immediately after birth while Hadza mothers were sampled contemporaneously with infants. To identify differences between strains shared among Hadza versus Swedish dyad pairs, we performed in silico rarefaction to account for differences in sequencing depth between the studies. Bacteria of the phylum Bacteroidota (also known as Bacteroidetes) were the most commonly vertically transmitted strains in both populations (Fig. 3C) . VANISH bacteria and bacteria of the genus Prevotella made up a higher portion of vertically shared strains in the Hadza, while bacteria of the genus Bacteroides and BloSSUM taxa were more commonly shared among Swedish dyads (Fisher's exact test; P < 0.01) (Fig. 3C) . While we cannot exclude the possibility that the small difference in infant age between populations plays a role, the differences seen in infants largely mirror lifestyle-related compositional differences observed among adults, consistent with the finding that species that were more abundant in maternal samples were more likely to be vertically transmitted (Fig. S10 ). Taken together, these data suggest that vertical transmission may propagate lifestyledependent differences in microbiome composition (33) . Together our data show that infants from all lifestyles begin life with similar (Bifidobacteria-dominated) gut microbiota compositions, but subtle differences detected in early life compound over time. The minor taxa found by amplicon analysis to differentiate lifestyles at 0-6 months of life (Bacteroides in industrialized infants and Prevotella in non-industrialized infants) were the same taxa revealed by detailed metagenomic analysis to be the most commonly vertically transmitted. These data suggest that vertical transmission may be a mechanism by which microbiota change is propagated over generations in response to altered lifestyles (34, 35) . Important differences were also discovered in the species composition and HMO-degradation genes of the initially-dominant Bifidobacterium communities, and recent studies of these same genes suggest that their depletion in industrialized infants could have long-term negative immune consequences (25) . Crucially, in almost all analyses performed in this study, infants living transitional lifestyles display intermediate phenotypes between those of industrialized and nonindustrialized infants. While not conclusive, this is an important piece of evidence pointing to lifestyle as a possible causative factor in infant microbiome assembly. The Hadza-specific 12 discoveries reported in this work (including the finding of increased non-dyad vertical transmission among members of the same bush camp, a social structure with no equivalent among industrialized communities) exemplify the importance of studying people outside of industrialized nations, and highlights the need for additional studies to provide equity in microbiome understanding across global societies. Our results also highlight the question of whether lifestyle specific differences in the gut microbiome's developmental trajectory predispose populations to diseases common in the industrialized world, such as those driven by chronic inflammation (36, 37) . Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of Tanzania Links between environment, diet, and the hunter-gatherer microbiome Human gut microbiome viewed across age and geography Temporal development of the gut microbiome in early childhood from the TEDDY study Gut microbiomes from Gambian infants reveal the development of a non-industrialized Prevotella-based trophic network The Hadza: Hunter-gatherers of Tanzania Demography of the Hadza, an increasing and high density population of savanna foragers Juvenile foraging among the Hadza: Implications for human life history Allomaternal Care among the Hadza of Tanzania A sparse covarying unit that describes healthy and impaired human gut microbiota development Microbiome LPS Immunogenicity Contributes to Autoimmunity in Humans Infant and Adult Gut Microbiome and Metabolome in Rural Bassa and Urban Settlers from Nigeria The association of gut microbiota characteristics in Malawian infants with growth and inflammation Microbiota assembly, structure, and dynamics among Tsimane horticulturalists of the Bolivian Amazon Elevated rates of horizontal gene transfer in the industrialized human microbiome FastSpar: rapid and scalable correlation estimation for compositional data Gut microbiome transition across a lifestyle gradient in Himalaya Developmental trajectory of the healthy human gut microbiota during the first 5 years of life Ultra-deep Sequencing of Hadza Hunter-Gatherers Recovers Vanishing Microbes A unified catalog of 204,938 reference genomes from the human gut microbiome Comparative Genome Analysis of Bifidobacterium longum subsp. infantis Strains Reveals Variation in Human Milk Oligosaccharide Utilization Genes among Commercial Probiotics Varied Pathways of Infant Gut-Associated Bifidobacterium to Assimilate Human Milk Oligosaccharides: Prevalence of the Gene Set and Its Correlation with Bifidobacteria-Rich Microbiota Formation Broad conservation of milk utilization genes in Bifidobacterium longum subsp Bifidobacteria-mediated immune system imprinting early in life Complex N-glycan breakdown by gut Bacteroides involves an extensive enzymatic apparatus encoded by multiple coregulated genetic loci Mother-to-Infant Microbial Transmission from Different Body Sites Shapes the Developing Infant Gut Microbiome Banfield, inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains Transmission of human-associated microbiota along family and social networks Infant gut strain persistence is associated with maternal origin, phylogeny, and traits including surface adhesion and iron acquisition Cospeciation of gut microbiota with hominids Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life The Past and Future Biology of the Human Microbiome in an Age of Extinctions Diet-induced extinctions in the gut microbiota compound over generations US Immigration Westernizes the Human Gut Microbiome Vulnerability of the industrialized microbiota The ancestral and industrialized gut microbiota and implications for human health We thank David Relman and Chris Damman for helpful discussion and input throughout the project conceptualization and analysis. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This research utilizes data obtained by the TEDDY study group, a collaborative clinical study sponsored by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID) This manuscript was not prepared in collaboration with investigators of the TEDDY study and does not necessarily reflect the opinions or views of the TEDDY study, dbGaP, or the NIDDK. Funding This work was funded by grants from the National Institutes of Health (DP1-AT009892 and R01-DK085025 to JLS), an NSF Graduate Research Fellowship to DD (DGE-1656518) and to BDM (DGE-114747), a Stanford Graduate Smith Fellowship to DD, and National Institutes of Health grant F32DK128865 to MRO We acknowledge the numerous people and organizations who provided logistical support and conducted sample collection in the USA, Tanzania, and Nepal, including Dorobo Safaris, the The authors declare no conflict of interest. The authors declare that the data supporting the findings of this study are available within the paper and its supplementary information files. Metagenomic reads and genomes generated in this study are available under BioProject PRJEB27517. Accession numbers for individual samples and genomes are available in Tables S2 and S7. Figures S1-S10Tables S1-S7References (38-71)