key: cord-0287744-mqg6x6we authors: Chrisman, Brianna; He, Chloe; Jung, Jae-Yoon; Stockham, Nate; Paskov, Kelley; Washington, Peter; Wall, Dennis P. title: Transmission Dynamics of Human Herpesviruses and Other Blood DNA Viruses from Whole Genome Sequences of Families date: 2022-02-02 journal: bioRxiv DOI: 10.1101/2022.01.31.478555 sha: 7656ce5cb2b6fc7f9ba832d274961d34e1c314e2 doc_id: 287744 cord_uid: mqg6x6we While hundreds of thousands of human whole genome sequences (WGS) have been collected in the effort to better understand genetic determinants of disease, these whole genome sequences have rarely been used to study another major determinant of human health: the human virome. Using the unmapped reads from WGS of 1,000 families, we present insights into the human blood DNA virome. In addition to extensively cataloguing the viruses detected in WGS of human whole blood and lymphoblastoid cell lines, we use the family structure of our dataset to show that household drives transmission of many microbes. We also identify several cases of inherited chromosomally integrated herpes 6A and 6B and locate candidate integration sequences for these cases. We document genetic diversity within exogenous and integrated HHV species and within integration sites of HHV-6. Finally, in the first observation of its kind, we present evidence that suggests widespread de novo HHV-6B integration and HHV-7 episome replication in lymphoblastoid cell lines. These findings show that the unmapped read space of WGS may be a promising avenue for virology research. As the cost and speed of whole genome sequencing (WGS) continues to improve, many research institutions have undertaken large scale whole genome sequencing studies in an effort to better understand genetic determinants of human diseases [51, 33, 27, 11] . While high coverage (>30x) WGS produces several hundred gigabytes of raw data per sample [36] , in many pipelines up to 30% of these reads go unused because they fail to align to the human reference genome. [45] . These unmapped reads may originate from non-reference human DNA sequences, organic reagents and contamination, and human viruses. Meanwhile, the last decade of advances in sequencing has also empowered the field of metagenomics and the study of the human microbiome, areas where next-gen sequencing technologies have allowed for the rapid characterization of bacteria, small eukaryotes, and viruses that inhabit human environments. While much of microbiome and virome research has focused on the gut microbiome [50, 32, 20] which has clear communication links between the human digestive system, nervous system and immune system, recently it has been suggested that microbiota with low microbial loads may also play novel roles in disease [12, 5, 54] . One such microbiota is human blood, and it is still up for debate whether or not there is a healthy blood microbiota or whether the presence of bacteria inherently indicates disease. [6, 46, 22] . Several studies have investigated the blood bacteriome in an attempt to understand the healthy blood bacterial microbiome, but only a handful studies have attempted to characterize the human blood virome and have been done in mostly diseased cohorts [37, 10, 4] . Furthermore, despite the importance of the blood virome in blood transfusion and stem cell transplant safety [18, 52] , research in emerging pathogens [25, 15] , and immune system regulation [16] , to our knowledge only one study has analyzed the blood virome on the scale of thousands of individuals [35] . The early stages of the SARS-CoV-2 pandemic underscored how important it is to understand transmission patterns for different viruses. [31] . Particularly in the case of intrafamilial transmission, non-sexual transmission between members of 3 Results Using unmapped or poorly aligned reads from WGS of 4,569 individuals, we were able to reclassify reads to over 100 species of viruses. We show the top 50 most abundant viruses in Fig. 2 , clustered by Spearman association across samples. Of note, we see four important categories of viruses: Human herpesviruses (HHV) 6A, 6B, and 7 are common blood viruses that are normally acquired during childhood. HHV-6 has the ability to integrate into host cells, and be inherited through ancient integration events in germline cells that are passed down mendellianly. HHV-6A, 6B, and 7 viral reads are likely true HHVs present in the blood, and we discuss the viral load profiles of HHV-6 and HHV-7 in depth later on. Lambda phage PhiX is a common reagent used in sequencing pipelines to calibrate Illumina machines and balance GC content. Reads classified as PhiX relatives are probably either mismappings to homologous regions, or contamination of the commercial PhiX reagents [35] . Similarly, Epstein Barr virus (EBV) or human gammaherpesvirus 4, is used to immortalize lymphoblastoid cell lines (LCLs), and so EBV and relatives are probably artifacts from the LCL immortalization pipeline. Torque Teno Viruses (TTV) and Erythroviruses are fairly common blood viruses that are usually acquired during childhood. We suspect the TTV and erythrovirus reads are probably true reads originating from an active TTV or erythrovirus infection. Using an F-regression, and regressing viral load against sequencing plate, biological sample source (WB vs LCL), sample metadata (such as autism phenotype, sex, household/family, and parent vs. child status), we identified several viruses significantly associated with sequencing plate (Fig. 2B ), biological sample type ( Fig. 2D -E), and household/family (Fig. 2C ). One reason viruses may show different levels of abundances between biological sources is because they preferentially infect lymphocytes or a different type of blood cell. This is likely the case for TTV and human herpesviruses, both of which have been shown to preferentially infect specific types of blood cells over others [47, 2] . Another reason for biological source-dependent abundance is that different reagents and pipelines are used in LCL versus whole blood prep and storage pipelines and may lead to different contamination profiles. This is probably the case for EBV (gamma herpes 4) enriched in LCLs (acquired during the EBV-induced immortalization step) and its relatives, as well as the non-herpesviruses enriched in whole blood samples. Similarly, viruses enriched in specific sequencing plates (Fig. 2B ) indicate batch contamination introduced during sequencing prep and sequencing. Interestingly, we found several viruses associated with household/family, indicating that family members may be transmitting an active infection within their household. Even in low counts, we see a statistically significant family association for torque teno virus, as well as for erythroparvovirus. We also see an association between family and human herpesviruses. This particular association is likely driven by two mechanisms: inherited chromosomally integrated human herpesvirus (ici-HHV), which is passed down from parent to child through Mendelian inheritance, and active infections being transmitted within a household. We discuss the differences in viral load profiles of the different human herpesviruses later on. Human herpesvirus 6 can integrate into host genomes, and ancient germline integration events can be seen in present day as mendellianly inherited genotypes. We identified 28 samples (.6% of samples, 14 with iciHHV-6A and 14 with iciHHV-6B) with a high likelihood of having iciHHV-6A or iciHHV-6B. These samples had HHV read counts consistent with 1 copy of HHV per cell (or .5 HHV genomes/human genome copy), and had a parent or child in the same family also with high HHV-6A or 6B counts. The HHV counts of these samples and others are shown in Fig. 3A . The probable iciHHV-6B samples came from both WB and LCLs. While all 14 cases of iciHHV-6A were only found LCL samples, the LCL samples outnumber WB by 10-fold so this was not statistically significant. There was no overlap between the samples with likely iciHHV-6A and those with likely iciHHV-6B. Additionally, no samples showed evidence of homozygous iciHHV (a copy of iciHHV inherited from each parent). Additional evidence for iciHHV-6 comes from the coverage profiles of samples with high HHV-6A and high HHV-6B ( Fig. 3B-C) . Although there are some regions with slightly lower or higher average coverages (corresponding to homologous regions between 6A and 6B, low complexity regions, or high GC content regions), no single region dominates the dominates the coverage profile, indicating that full HHV-6 viral genomes exist in these samples and are not artifacts of mismappings. Similar coverage profiles were found for HHV-6B with medium viral loads (Fig. 3D ) and HHV-7 in samples with both high and medium viral loads ( Fig. 3 E-F). While HHV-6A abundance showed a clear bimodal distribution, HHV-6B had a more continuous distribution in the LCLs, with many samples having abundances ¿.1 copies/genome reads but not showing inheritance patterns consistent with iciHHV. Thus, HHV-6B has a chromosomal integration pattern that suggests two distinct types of integration events: iciHHV-6 and de novo integration events in lymphocytes. We note that the coverage profiles suggest only minimal mismappings between HHV-6A and HHV-6B at conserved regions, and no mismappings from or correlation to gammaherpesvirus 4 ??. We found that reads mapped to the end of HHV-6A and HHV-6B frequently had a mate mapped to a specific region on the decoy reference genome, chrUn JTFH01000690v1 decoy ??A-D. This reference sequence is probably an unplaced telomeric sequence, and serves as a HHV integration site. De novo assembling and clustering these potential integration sites, we see common integration sequences for HHV-6A and HHV-6B. In HHV-6B, both probable iciHHV and de novo integrated samples share 3 canonical integration sequences. While LCLs are immortalized by injecting Epstein Barr Virus (herpesvirus 4), HHV-6 does not play a role in the LCL pipeline, nor did we find any relationship between HHV-4 and HHV-6B viral loads. It has been shown that HHV-6 infects lymphocytes, and establishes latency via chromosomal integration [17] . HHV-6B is also extremely prevalent in most populations, more so than the much rarer HHV-6A [14] . Therefore, a plausible explanation for this spectrum of HHV-6B abundance is that HHV-6B is chromosomally integrated with a fraction of the cells in the sample: During a HHV-6B infection, HHV-6B established latency via integration one or more lymphocytes. Genetic drift, natural selection, or reactivation during that person's life and during LCL passaging causes different samples to have different fractions of infected cells. HHV-7 did not show any evidence of iciHHV: no WB cells had high counts of HHV-7, and none of the LCL samples with HHV-7 counts consistent with iciHHV have parent-offspring relationships. This is consistent with findings that HHV-6, but not HHV-7, can integrate into host chromosomes [26] . However, like HHV-6B, HHV-7 shows a continuous distribution in the LCLs, suggesting that HHV-7 latency combined with genetic drift, positive selection, or reactivation causes different samples to have different fractions of HHV-7 genomes. On the other hand, HHV-7 reads did not consistently pair with any hg38 or decoy contig, nor did they frequently pair with unmapped reads. This is in agreement with the differences between HHV-6 and HHV-7 infection: while HHV-6 establishes latency primarily via chromosomal integration, HHV-7 establishes latency by forming an episome inside the host nucleus [14] . We thus hypothesize that a primary infection causes a small fraction of lymphocytes in sample to initially contain latent HHV-7. Either through replication of the episome, or reactivation and infection of additional cells in the sample, HHV-7 increases its load throughout LCL immortalization and passaging. Interestingly, we ran the same HHV alignment pipeline on unmapped reads from the 1000genomes dataset [38] of high coverage WGS from around the world. We did not find a continuous distribution of HHV-6B; rather we found a bimodal distribution with most samples having almost 0 HHV-6B read counts, and <1% of samples having HHV-6B read counts consistent with HHV-6B. In the 1000genomes cohort, also WGS derived from LCLs, we also found only one case of medium abundance HHV-6B (500 reads), with the rest of the samples having <10 reads aligning to HHV-7. Notably, we found HHV-7 and HHV-6B to be more abundant in children than parents in our dataset. Because the 1000genomes data we used was all from adults, we hypothesize that childhood infection (coupled with de novo integration or latent establihsment of HHV-6B and HHV-7 into lymphocytes) is driving the odd distributions of HHV-6B and HHV-7 in the iHART dataset. Alternatively, the immortalization and storage processes in the iHART dataset may be increasing integration and intra-sample re-infection rates. We wished to understand the origins and diversity of circulating, latent, and iciHHV. We de novo reconstructed the HHV genome from each sample where possible (de novo assembly failed for samples with low HHV read counts) , and compared genomes using MAAFT multiple sequence alignment and ClustalW phylogenetic tree generation (See Methods). As seen in Fig. 4 , HHV 6A, 6B and 7 also exhibit genetic diversity across our samples. HHV 6A genomes fall into three distinct clusters. Family members always fall into the same clade, presumably because these are cases of iciHHV, and parents always pass on the same variant of HHV to their offspring. HHV-6B also exhibits genetic diversity, with genomes in many different clusters that are less distinct than those of HHV-6A. Notably, samples with likely iciHHV-6B do always fall into the same clade as their family members, an the HHV genomes from these families are also very closely phylogenetically related to each other. HHV-7 also exhibits genetic diversity, and does not seem to originate from a single source (as might be the case if HHV-7 was a contaminant). Interestingly, HHV-7 genomes from members of the same family tended to be much closer phylogenetically than HHV-7 from unrelated individuals (Mann-Whitney U test using distance matrix values, p-value < .05). Removing the suspected iciHHV cases, HHV-6B also showed the same trend (p-value < .05). This may indicate that the HHV-6B and HHV-7 variant that established itself in LCLs originated from an initial infection that was spread within a household. Using megahit, MAAFT, and ClustalW, we de novo assembled, aligned, and built a phylogenetic tree from the reads that did not align to HHV-6A, 6B, or 7 but had mates that aligned to HHV. HHV-7 had very few of such reads and thus de novo assembly was not possible in any sample. However, HHV-6A and HHV-6B show clear canonical flanking sequences, which we refer to as candidate integration sites (Fig. ??) . Interestingly, there is little variation within the 5' and 3' integration sites for HHV-6A. Small single-nucleotide differences are shared among family members, indicating inherited integrated viruses and sites. HHV-6B 5' and 3' flanking regions also cluster into clear canonical candidate integration sites. Both the 3' and 5' sites cluster into 3 distinct clusters, with highly dissimilar sequences. Family members with suspected iciHHV-6B usually fall within the same cluster, however in the 5' flanking integration site families AU0412, AU2140, and AU4056 fall into separate clusters and in the 3' flanking region members from family AU4056 falls into separate clusters. When we matched the candidate integration sites to public sequences using NCBI's BLAST, all sequences matched to isolate HHV or endogenous HHV sequences. In particular, sequences matched to studies studying integrated HHV diversity [3, 56, 57, 21, 23 ]. Using whole genome sequences, we extensively catalogue DNA viruses present in human whole blood and lymphocytes. Additionally, we found several viruses that are often transmitted within families. In particular, erythroviruses and torque teno viruses may be transmitted within households though the mechanism of the particular transmissions in our dataset remains unknown. Previous studies have identified both transplacental and fecal-oral modes of transmission in torque teno viruses [48] . Erythroviruses also can be transmitted transplacentally, and more commonly through respiratory droplets [28] . We additionally identified 28 cases of suspected iciHHV-6. We show that integrated herpesviruses are genetically diverse, with variable genomes sites and integration sites across families. Additionally, herpesviruses seem to be often transmitted within families, as samples from family members more often contain the same exogenous HHV-6B and HHV-7 variant than those from unrelated individuals. It may also be that common variants within families are the result of variation in herpesviruses specific to different regions in the U.S. [21] . HHV has been implicated in several diseases such as multiple sclerosis, encephalomyelitis, and febrile convulsions [13] . Genetic differences in exogenous HHV and iciHHV and its integration sites could influence disease pathology and contribute to different incidences of disease across different regions of the world [49] The unmapped read space of whole genome sequencing data is an easy method for better understanding HHV diversity and its possible role in disease. To our knowledge, this is the first study to show evidence of widespread replication of non-inherited HHV-6B and HHV-7 in LCLs using thousands of samples. Moreover, previous studies have shown that HHV-6 and HHV-7 typically preferentially infect (CD4+) T-lymphocytes. However, the LCLs from the iHART dataset are derived from B-lymphocytes, indicating that B-lymphocytes may be an underappreciated route for HHV infection. We hypothesize that the replication of HHV-6B and HHV-7 occur by separate but related mechanisms. We hypothesize a primary infection of HHV-6B in one or more lymphocytes from the donor de novo integrated into the host chromosomes (as evidenced by the reconstructed integration sequences), either while still in the host or during the process of LCL immortalization and storage. Genetic drift, positive selection, or reactivation then increased the fraction of cells with an integrated virus over time, leading to varying loads of HHV-6B across samples. HHV-7 on the other hand, is not known to chromosomally integrate, but rather establishes latency via an episome inside the nucleus. Similar to HHV-6B, we hypothesize a primary infection of HHV-7 results in an HHV-7 episome inside a small fraction of cells. The episome replicates in tandem with the host chromosomes, Again, genetic drift of positive selection increases the fraction of cells with episomal HHV-7. This would be very similar to the life cycle of Karposi's sarcoma herpesvirus (HHV-8) [24] , which establishes latency via an extrachromosomal nuclear episome that colocalizes to the chromosomes in order to replicate in tandem with the host cell. In this study, we have used the unmapped read space of whole genome sequences to better understand prevalence and intra-family transmission patterns of various blood viruses. To our knowledge this is the first study using large WGS datasets of families in order to study viral transmission. Additionally, the unique family structure of our dataset allowed us to identify likely cases of iciHHV-6A and iciHHV-6B and document the genetic and integration site variation within these species. This is also the first study to observe and hypothesize about the widespread de novo HHV-6B integration and HHV-7 replication in LCLs. We hope this encourages further research on HHV-6 and HHV-7 integration and latency. The samples in our dataset with these unique distributions are available for future research upon request and application. We performed such analyses using a collection of WGS data that was generated for unrelated purposes (to understand the genetic components of autism). We suspect whole genome sequences contain a wealth of untapped data, and may be valuable resources beyond their traditional GWAS use cases. Particularly, as more WGS data is generated from diverse global populations, the unmapped read space could be used to track the spread and geography of various viruses. We obtained Whole Genome Sequencing (WGS) data from the Hartwell Autism Research and Technology Initiative (iHART) database, which includes 4,842 individuals from 1,050 multiplex families in the Autism Genetic Resource Exchange (AGRE) program [44] . A total of 4,568 individuals from 1,004 families passed quality control and were included in the analyses. DNA samples were derived from whole blood (WB) or lymphoblastoid cell lines (LCL) and sequenced at the New York Genome Center. All WGS data from the iHART database have been previously processed using a standard bioinformatics pipeline which follows GATK's best practices workflows. Raw reads were aligned to the human reference genome build 38 (GRCh38 full analysis set plus decoy hla.fa) using Burrows-Wheeler Aligner (bwa-mem). We excluded secondary alignments, supplementary alignments, and PCR duplicates from downstream analyses. We extracted reads from the iHART genomes that were unmapped to hg38 and the decoy reference or mapped with low confidence. Low-confidence reads were defined as reads marked as improperly paired and had an alignment score below 100. We used alignment score rather than mapping quality in order to select for reads were likely not true alignments to the human reference genome, rather than for reads that had ambiguous alignments to hg38. These reads were then re-paired if both ends needed to be realigned, and lastly separated into single-end and pair-end files. We used Kraken2 [55] to align the unmapped and poorly aligned reads to a the Kraken default (RefSeq) databases of archaeal, bacterial, human (GRCh38.p13), and viral sequences [39] . These references databases were accessed on Feb 16, 2021. Kraken2 was run on the unmapped and poorly mapped reads from each sample, using the default parameters. Because Kraken was able to map the majority of reads down to the species or strain level, Kraken classifications were aggregated by species before downstream analysis. To analyze the effect of various demographic (such as household, autism status, and sex) and experimental parameters (such as sequencing plate and sample type) on microbial and viral profile, we performed an F-regression analysis. We chose an F-regression because many variables were highly collinear with each other: for example, samples from the same household were nearly always sequenced on the same sequencing plate, autism is much more prevalent in males, and the same sample types were normally collected from households. For each microbe, we built an ordinary least squares (OLS) model, using as our regressor an indicator matrix of sample type, sex, child vs. parent, autism status, sequencing plate, household/family, and sample id, and as our response variable the log-normalized counts of microbes (with pseudo-counts of 1). Using the statsmodels library, we then ran a forward OLS regression in which we iteratively selected the regressor features that best explained the previous models residuals, and ceased adding features when the ANOVA score between the previous and new models was no longer statistically significant (adjusted p-value<.05). Using bwa-mem with the default parameters, we aligned all reads classified by Kraken as belonging to herpesviruses to a set of reference genomes consisting of hg38 and the decoy, and all the herpesvirus genomes present in the RefSeq database. Most importantly, this included human betaherpesvirus 6A (NC 001664.4), human betaherpesvirus 6B (NC 000898.1), human betaherpesvirus 7 (NC 001716.2 ), and both the decoy and RefSeq genome for human gammaherpesvirus 4, or the Epstein-Barr virus (chrEBV in the decoy genome, and NC 007605.1 in RefSeq). We performed the same analysis using 2504 high-coverage WGS LCL samples from the most recent release of the 1000 genomes dataset (Fig. ?? ). To covert the herpesvirus read counts to viral genomes per host genome, we normalized against the average coverage for two housekeeping genes are not known to show copy number variation, EDAR and HBB [43] . We used pysam [1] and an in-house script to collect genome-wide coverages for different combinations of pairings in order to generate the coverage graphs in Fig. 3 and Fig. 5 . To generate the integration site assemblies and alignments (Fig. 5) , we first extracted reads that were not classified as herpesvirus reads but had a mate that aligned to the start or end of the herpesvirus genome. For each individual, we de novo assembled these reads. Using MAAFT [?], we then performed multiple sequence alignment of these assemblies, and used ClustalW to generate phylogenetic trees. We used the default parameters for MAAFT, and allowed for reverse complementary sequences to be generated as needed. Before generating phylogenetic trees, we attempted to remove redundant sequences that might correspond to a forward sequence and its reverse complementary sequence. We did the following: if a sample had two assembled sequences (presumably corresponding to a forward sequences and a reverse complementary sequence), we removed the sequence that had the least number of matches to the consensus sequence generated by all samples. We used ClustalW on the EMBL browser [30, 34] , with a neighbor-joining algorithm, no distance correction, and ignoring gaps. We BLASTED these assembled sequences against NCBI's nt nucleotide collection using the default parameters, and not masking low-complexity regions To generate the assemblies of the viral genomes, we extracted reads aligned to HHV-6A, HHV-6B, and HHV-7. We used bcftools to perform variant calling on all of the samples against the reference HHV-6A, HHV-6B, and HHV-7 genomes. We used VCF2phylip [41] to convert the variant calls to alternate reference sequences. We filtered to samples that had variants or reference alleles called at at least 50% of loci. Similar to the integration sites, we performed multiple sequence alignment on the reconstructed viral genomes using MAAFT with the default parameters and generated phylogenetic trees using ClustalW using the same parameters as above. We used Biopython's Phylo library [9] an in-house python script to generate the sequence alignment trees and diagrams used in Fig. 5 and 4. Analysis code and scripts, as well as the sequences used for KRAKEN classification and herpesvirus realignment can be found at ]https://github.com/briannachrisman/blood microbiome. Access to reads from the iHART dataset can be requested via www.ihart.org. -Map HHV reads to to HHV genome location using BWA - Laboratory and clinical aspects of human herpesvirus 6 infections Evolutionary History of Endogenous Human Herpesvirus 6 Reflects Human Migration out of Africa Whole blood human transcriptome and virome analysis of ME/CFS patients experiencing post-exertional malaise following cardiopulmonary exercise testing The human skin microbiome The healthy human blood microbiome: Fact or fiction? Frontiers in Cellular and Infection Microbiology Analysis of Sex and Recurrence Ratios in Simplex and Multiplex Autism Spectrum Disorder Implicates Sex-Specific Alleles as Inheritance Mechanism A method for localizing non-reference sequences to the human genome Wilczynski, and Michiel J.L. De Hoon. Biopython: Freely available Python tools for computational molecular biology and bioinformatics Human microbiome: an academic update on human body site specific surveillance and its possible role Human herpesvirus type 6 and human herpesvirus type 7 infections of the central nervous system HHV-6A, 6B, and 7: Persistence in the population, epidemiology and transmission Plasma virome of 781 Brazilians with unexplained symptoms of arbovirus infection include a novel parvovirus and densovirus Torque Teno virus load as a surrogate marker for the net state of immunosuppression: The beneficial side of the virome Chromosomal integration by human herpesviruses 6A and 6B Marseillevirus, blood safety, and the human virome Inherited chromosomally integrated human herpesvirus 6 as a predisposing risk factor for the development of angina pectoris The Gut Virome Database Reveals Age-Dependent Patterns of Virome Diversity in the Human Gut Comparative genomic, transcriptomic, and proteomic reannotation of human herpesvirus 6 Response to: 'Circulating microbiome in blood of different circulatory compartments Comparison of the Complete DNA Sequences of Human Herpesvirus 6 Variants A and B Kaposi's sarcoma herpesvirus genome persistence Virome analysis of transfusion recipients reveals a novel human virus that shares genomic features with hepaciviruses and pegiviruses Herpesvirus telomeric repeats facilitate genomic integration into host telomeres and mobilization of viral DNA during reactivation Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized With Early-Onset Myocardial Infarction. Circulation Erythrovirus B19 infection in humans Chromosomally integrated human herpesvirus 6 in heart failure: Prevalence and treatment Clustal W and Clustal X version 2.0. Bioinformatics Transmissibility and transmission of respiratory viruses NIH Human Microbiome Portfolio Analysis Team lita. proctor@ nih A review of 10 years of human microbiome research activities at the us national institutes of health, fiscal years Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7 The blood DNA virome in 8,000 humans Whole genome sequencing analysis for cancer genomics and precision medicine Viral metagenomics in Brazilian multiply transfused patients with sickle cell disease as an indicator for blood transfusion safety Department of Health Social Care. 100,000 whole genomes sequenced in the NHS -GOV Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation Role of intrafamilial transmission in high prevalence of hepatitis C virus in Egypt Ortiz. vcf2phylip v2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis Estimating sequencing error rates using families Inherited Chromosomally Integrated Human Herpesvirus 6 Demonstrates Tissue-Specific RNA Expression In Vivo That Correlates with an Increased Antibody Immune Response Inherited and De Novo Genetic Risk for Autism Impacts Shared Networks From trash to treasure: Detecting unexpected contamination in unmapped NGS data Eicke Latz, Benjamin Lelouvier, Jonel Trebicka, and Manimozhiyan Arumugam. Circulating microbiome in blood of different circulatory compartments TT virus (TTV) loads associated with different peripheral blood cell types and evidence for TTV replication in activated mononuclear cells Human anelloviruses: an update of molecular, epidemiological and clinical aspects Whole genome diversity of inherited chromosomally integrated HHV-6 derived from healthy individuals of diverse geographic origin A core gut microbiome in obese and lean twins The 100 000 Genomes Project: Bringing whole genome sequencing to the NHS Human pegivirus persistence in human blood virome after allogeneic haematopoietic stem-cell transplantation Intrafamilial Transmission and Family-Specific Spectra of Cutaneous Betapapillomaviruses The closed eye harbours a unique microbiome in dry eye disease Improved metagenomic analysis with Kraken 2 Variation in human herpesvirus 6B telomeric integration Inherited Chromosomally Integrated Human Herpesvirus 6 Genomes Are Ancient, Intact, and Potentially Able To Reactivate from Telomeres Thank you to The Hartwell Foundation for supporting the creation of the iHART database and the Simons Foundation for additional support for genome sequencing. We thank the New York Genome Center for conducting sequencing and initial quality control of the iHART dataset. We thank Amazon Web Services for their grant support for the computational infrastructure and storage for the iHART database. This work has been supported by grants from The Hartwell Foundation and the NIH (U24 MH081810, R01MH064547, NS101158, NS070911, NS101665, NS095824, S10OD011939, P30AG10161, R01AG17917, and U01AG61356 ) and from the Stanford Precision Health and Integrated Diagnostics Center and from the Stanford Bio-X Center. Thank you to Jesse Arbuckle and Louis Flamand for their advice and discussions on the HHV distributions. Still Unmapped