key: cord-0304355-besbroy6 authors: Friedenson, Bernard title: Chromosome breaks in breast cancers occur near herpes tumor virus sequences date: 2021-11-10 journal: bioRxiv DOI: 10.1101/2021.11.08.467751 sha: 0681d70544dae0f0fbf180f37d95edf4320704dc doc_id: 304355 cord_uid: besbroy6 This work finds viral DNA associates with most chromosome breaks in breast cancer and provides a mechanism for why this is so. Nearly 2000 breast cancers were compared to known Epstein-Barr virus (EBV) variant cancers using publicly available data. Breast cancer breakpoints on all chromosomes cluster around the same positions as in nasopharyngeal cancers (NPCs), cancers 100% associated with EBV variants. Breakpoints also gather at the same differentially methylated regions. Breast cancer further has an EBV methylation signature shared with other cancers that inactivates complement. Another known EBV cancer (Burkitt’s lymphoma) has distinctive MYC gene breakpoints surrounded by EBV-like DNA. EBV-like DNA consistently surrounds breast cancer breakpoints, which are often near known EBV binding sites. EBV explains why a break in a chromosome does not simply reconnect in breakage-fusion-bridge models, but instead destabilizes the entire genome. This work does not prove EBV variants cause breast cancer, but establishes links to high-risk chromosome breaks and other changes. By the time breast cancer occurs, its causes are hopelessly buried under multiple risk factors, thousands of mutations, and a lifetime of exposure to mutagens. One risk factor is Epstein-Barr virus (EBV/HHV4) which exists as a latent infection in almost all humans (>90%). Epidemiological associations between breast cancer and EBV infection occur in different geographical locations (Fina et al., 2001; Peng et al., 2014; Sinclair et al., 2021) . Although most people control EBV infection, (Fina et al., 2001; Lawson and Glenn, 2021) , EBV increases breast cancer risk by 4.75 to 6.29-fold according to meta-analyses of 16 or 10 studies, respectively. An analysis of 24 case-control studies showed EBV is significantly more prevalent in breast cancer tissues than in normal and benign controls (Lawson and Glenn, 2021) . Infection occurs early, with 50% of children age 6-8 already seropositive, and seropositivity increases to 89% at age 18-19 (Balfour et al., 2013) . In breast epithelial cell models, EBV infection facilitates malignant transformation and tumor formation (Hu et al., 2016) . Breast cancer cells from biopsies express latent EBV infection gene products (LMP-1, -2, EBNA-, and EBER) (Ayee et al., 2020) , even after excluding the possibility that the virus comes from lymphocytes (Lorenzetti et al., 2010) . Evidence for the activated EBV lytic form in breast cancer predicts a worse outcome (Marrao et al., 2014) . Hereditary BRCA1-mutation-associated breast cancer tissues express EBV gene products (Lawson and Glenn, 2021) . EBV can trans-activate endogenous retroviruses (Bruce et al., 2021) , but unlike retroviruses, EBV does not have an integrase enzyme. Integration sequences are short (Xu et al., 2019b; Zapatka et al., 2020) , with alternate explanations possible. However, HHV6A and 6B do integrate into human chromosomes by a mechanism thought to involve recombination (Peddu et al., 2019) . Despite biological plausibility, studies testing the involvement of HHV6 in breast cancer have had methodologic limitations and are still inconclusive (Eliassen et al., 2018) . The present study determines whether nearly universal human EBV infection is associated with breaks and other changes in breast cancer chromosomes. The work grew from the observation that both hereditary and sporadic breast cancers have the same kinds of chromosome damage seen in cancers with established EBV relationships. These model EBVrelated cancers sometimes have wildly inappropriate chromosome interconnections at multiple different breakpoints. In lymphomas, EBV infection accompanies increased numbers of chromosome breaks, rearrangements, fusions, deletions, and insertions (Cuceu et al., 2018) . These abnormalities in breast cells are typically more abundant in populations at high-risk for breast cancer. Even a single break in one cell can destabilize the entire human genome and generate many further complex rearrangements (Umbreit et al., 2020) . In addition, EBV activation from its latent state causes massive changes in host chromatin methylation and structure Tang et al., 2012) . Aberrant methylation in breast cancer occurs at hundreds of host gene promoters and distal sequences (Batra et al., 2021) . Nasopharyngeal cancer (NPC) serves as a model for an EBV-associated epithelial cell cancer because NPC has an explicit EBV connection (Germini et al., 2020; Hau et al., 2020; Xu et al., 2019a) . 100% of malignant cells are EBV-positive, but the viral genome in the tumor has many single nucleotide polymorphisms. Nearly 8500 EBV forms are present in patients with EBV-associated cancers with over 2100 variants in a single host, each differing only slightly from a reference genome. Variants of the viral gene BALF2 are significantly associated with NPC (Xu et al., 2019a) . BALF2 is a single-strand DNA binding protein active during lytic phase (Tsurumi et al., 1996) . Because of whole-genome sequencing of 63 different NPC cancers and 7 NPC derivatives (Bruce et al., 2021) , it is possible to compare breakpoints in NPC to breakpoints in breast cancers. NPC has mutations affecting innate immunity, such as those in TGF-BR2, TGF-B2, TLR3, and interferon-alpha and gamma receptors. Moreover, NF-KB pathways constitutively activate an inflammatory response (Bruce et al., 2021) . These changes release controls on EBV infection. In breast cancer, mutations or downregulation in DNA repair genes linked to BRCA1-BRCA2 mediated repair pathways compromise immunity (Friedenson, 2013) . Immune deficits in sporadic breast cancers are well-known. Characteristics of hereditary breast cancers compared to viral cancers. The selection of hereditary breast cancer genomes for this study required patient samples with a known, typed BRCA1 or BRCA2 gene mutation from two studies (Nik-Zainal et al., 2016; Nones et al., 2019) . These hereditary cancers were mainly stage III ductal breast cancers or breast cancers having no specific type (Nones et al., 2019) . The COSMIC database curated from original publications (Nik-Zainal et al., 2016; Nik-Zainal et al., 2019) allowed comparisons of hereditary to sporadic breast cancer breakpoints. 74 hereditary or likely hereditary breast cancers were selected as typed BRCA1 or BRCA2 mutation-associated cancers or breast cancers diagnosed before age 40. A study of familial breast cancers contributed another 65 BRCA1/BRCA2 associated breast cancers (Nones et al., 2019) . Results were checked against breakpoints in 101 triple-negative breast cancers from a population-based study . Genome sequencing had been done before treatment began. Male breast cancers and cancers with BRCA1 or BRCA2 mutations diagnosed after age 49 were excluded since such mutations are less likely to be pathogenic. Cancers with hereditary mutations in PALB2 and p53 were also excluded. Sporadic breast cancers were taken as those diagnosed after age 70 in the absence of known inherited mutation. Hereditary and sporadic breast cancer patient DNA sequence data. Gene breakpoints for inter-chromosomal and intra-chromosomal translocations were obtained from the COSMIC catalog of somatic mutations as curated from original publications or from original articles (Nik-Zainal et al., 2016; Nones et al., 2019; Staaf et al., 2019) . The GrCH38 human genome version was used, and chromosome coordinates were converted to GrCh38 when necessary. DNA flanking sequences at breakpoints were downloaded primarily using the UCSC genome browser but did not differ from sequences obtained using the Ensembl genome browser. Positions of differentially methylated regions near breast cancer breakpoints (Tang et al., 2012) were compared to breakpoint positions in a set of 70 NPCs based on data of Bruce et al. (Bruce et al., 2021) immunoediting are not well characterized. An extensive validation included both direct and indirect effects of gene mutations. In addition, the Online Mendelian Inheritance in Man database (www.OMIM.org) was routinely consulted to determine gene function with frequent further support obtained through PubMed, Google scholar, GeneCards, and UniProtKB. The "interferome" was also sometimes used [www. interferome.org]. Viral homologies around breakpoints in BRCA -associated breast cancers cluster around breakpoints in NPC, a model cancer caused by EBV. Based on genomes from 139 hereditary breast cancers and 70 nasopharyngeal cancers (NPC), Fig. 1 shows that every human chromosome in female breast cancers has breakpoints that cluster near those in NPC, a known EBV-mediated cancer. Peaks at the left in Fig 1 graphs show that most breakpoints are within 200,000 base pairs of breakpoints in NPC. (200,000 is the approximate number of base pairs in EBV, allowing for some error). The exact percentage of breast cancer breaks near NPC breaks varies on different chromosomes but rises to 65% on chromosome 13. If chromosome breakpoint positions follow probability theory, they result from many independent events. In probability theory, the Central Limit Theorem states that the sums of independent random variables tend toward a normal distribution (i.e., a bell-shaped curve) even if the original variables themselves do not fit a normal distribution. If this logic applies to breast cancer breakpoints, they should occur at many positions, each making a small contribution. This assumption predicts a Gaussian distribution of breakpoints in the collection of breast cancers. Contrary to this prediction, statistical tests for normality found no Gaussian distribution for breakpoints in any hereditary breast cancer, sporadic breast cancer, or nasopharyngeal cancer on any chromosome. Instead, many breakpoints are shared among cancers. The results for chromosome 8 were tested for agreement with the breakage fusion bridge model. Breakpoints in breast cancer data were tested against chromosome 8 NPC breakpoints as follows. Clustered breakpoints were inserted into the original NPC data around the known NPC breaks. Breakpoint base positions of 10k, 25k, 50k, 100k, 150k, and 200k base pairs were added and subtracted from NPC break positions until the total numbers of breaks were the same as the breast cancer data. The NPC data was then tested for correlation with breast cancer breaks by simple linear regression. The results correlate well (r=0.973, r2=0.975,p<0.0001). Moreover, a histogram of positions of NPC vs. breast cancer breakpoints showed every NPC break coincided with breast cancer breaks within 1,000-10,000 base pairs. Breakpoints in Burkitt's lymphoma are near EBV-like sequences. Burkitt's lymphoma is a cancer strongly linked to EBV infection. Burkitt's lymphoma has a characteristic gene rearrangement in the Myc gene on chromosome 8 (Busch et al., 2007) . Fig 2 focuses the Chr8 homology data to a 10-megabyte region around the Myc gene. The highly characteristic Myc breakpoints in Burkitt's lymphoma are surrounded by multiple DNA segments that strongly resemble EBV. In fact, 59 segments resembling viruses were within 200,000 base pairs of MYC breakpoints. Of these, 54 were from EBV-like sequences. . All the hereditary breast cancers tested have significant damage to genes needed for immune system functions, and the genes often overlap those affected by breakpoints in NPC ( Fig 3) . Genes affected by breakpoints in familial breast cancers were compared to 4723 genes in the innate immunity database (Breuer et al., 2013) . At least 1542 innate immunity-related genes were directly affected by breakpoints in 65 familial breast cancers. The damage interferes with responses to antigens, pathogens, the ability to remove abnormal cells, and likely allows latent EBV infections to escape from control. Deregulation of innate immune responses may increase mutagenesis and drive multiple human cancers (Law et al., 2020) . In the present case, adaptative immunity must control latent EBV infection. A compromised or suppressed innate immune system allows reactivation and persistent infection by populations of EBV variants. Specific genes and pathways mutated in NPC are also instrumental in driving hereditary breast cancers. Some of the breakpoints in 65 familial breast cancers directly affect these NPC gene drivers, such as NFKB1, TGF-B. For example, 19 breast cancer breakpoints directly affect NFKB, 34 affect TGF-B, and 42 affect an interferon. Conversely, just as in NPC, breast cancers constitutively activate NFKB causing an inappropriate inflammatory response (Nakshatri et al., 1997) . Other herpes-mediated cancers, such as Kaposi's sarcoma also occur in the context of compromised immunity. Virus-human homology comparisons were conducted around thousands of human BRCA associated breakpoints. Long stretches of EBV variant DNA from two human gamma-herpesvirus 4 variants, HKNPC60 or HKHD40 are virtually identical to human breast cancer interchromosomal breakpoint DNA. Maximum homology scores for human DNA vs. herpes viral DNA are routinely over 4000, representing 97% identity for up to 2462 base pairs, with E "expect" values (essentially p-values) equal to 0. The extensive homology thus represents EBVlike breakpoint signatures. Breakpoints in hereditary breast cancers cluster around human sequences that resemble EBV variants. Fig. 4A shows all the viral homologies on the 145,138,636 base pairs in chromosome 8. Only a few different viruses have the strongest resemblance to human sequences. Over 11,000 regions have significant homology to EBV tumor variants. Relatively few positions (70) are similar to endogenous retroviruses, but the homology is strong. Breakpoints anywhere along the entire length of chromosome 11 were also compared to positions of homology to all viruses ( Fig 5) . Chromosome 11 has 4141 inter-and intrachromosomal breakpoints in the 139 hereditary breast cancers. Across all 135,086,622 base pairs in chromosome 11, there are 6212 matches to viral sequences having a maximum homology score above 500 within 200k base pairs of a breast cancer breakpoint. EBV variants are related to 71% of these matches. Five reiterations of a control 4141 breakpoints generated as random numbers had only 24 to 36 matches within 200,000 base pairs of a human viral-like sequence. Moreover, the 139 hereditary breast cancers had 205 matches to viral sequences within 1000 bases of a breakpoint. A random control had 0. EBV homologies still predominate on chromosome 17 even though there are fewer matches to EBV variants than on chromosome 11. The 83,257441 base pairs on chromosome 17 have 24,206 matches with virus-like sequences with 14,859 within 200,000 base pairs of a breakpoint, while random value breakpoints have only 396. Of 2147 breakpoints with a maximum homology score above 500 within 200k base pairs of a breakpoint, only 34% (737/2147) are related to EBV. For the data from both chromosome 11 and chromosome 17, the Fisher test odds ratio that the differences from random samples were significant is 137, p<0.0001. Hereditary breast cancer breakpoint homologies to EBV are near known EBV genome anchor sites: global comparisons on two chromosomes. Most breakpoints on segments of chromosomes 2 and 12 are near EBV genome anchor sites. On a 21 Mb section of chromosome 2, (Fig. 6a) , 63% of EBV docking sites are within 200k base pairs of a breakpoint in 139 BRCA1, BRCA2 hereditary or likely hereditary breast cancers. The docking of viral DNA likely coincides with EBV anchor sites due to the similarity between the human sequence and the viruses. Start positions of human genome similarity to the EBV variant tumor viruses HKNPC60 and HKHD40 positions aligned closely to almost all 56 EBV docking sites (Fig. 6a) . Some areas within this region of chromosome 2 also have significant homology to human retroviruses, porcine retroviruses, and to SARS-CoV-2 virus. A relatively low background similarity to HIV1 spreads across the region. Clusters of breakpoints are obvious in Fig 6a ( Black lines at the bottom of the graph in the upper part of the figure). In every case, these clusters include EBV docking sites, regions of strong EBV homology, or areas of homology to other viruses. Chromosome 2 has 1035 breakpoints or EBV docking sites; 924 of these breakpoints were within 200,000 base pairs of homology to an EBV variant or an EBV docking site, accounting for 89% of the breaks (Fig 6b) . There are 180 breakpoints on a 13 Mb section of chromosome 12, with only three not near an EBV docking site. Most breaks are within 200k base pairs of a region with strong viral homology defined as a maximum homology score >=500. In all, there are 947 viral homologies,and 713 are within 200k base pairs of a break. (Fig. 6b) . The lower panels in Figs 6a and 6b support the idea that breakpoints can initiate catastrophes. Breakpoints are all accessible as indicated by DNase hypersensitivity. Many breakpoints disrupt gene regulation, gene interaction, and transcription. Breakpoints all go through ENCODE candidate cis-regulatory elements (cCREs). Many breaks affect cancerassociated (COSMIC) genes. Some breakpoints disrupt the epigenetic stimulator H3K27Ac, an enhancer mark on histone packaging proteins associated with increased transcription. Most breast cancer breakpoints are near inhibitory epigenetic H3K9Me3 peaks in CD14+ primary monocytes (RO-01946). These markings occur around EBV genome anchor sites, where they contribute to viral latency and repress transcription . Both regions on chromosomes 2 and 12 are rich in these sites (Figs. 6a and 6b ). Both chromosome 2 and 12 sections are foci for structural variation such as CNV's, inversions, and short insertion/deletions. Multiple breakpoints on either chromosome 2 or 12 disrupt reference genes. Human reference sequence genes in the chromosome 2 region include KYNU, GTDC1, ACVR2A, KIF5C, STAM2, KCNJ3, ERMN, PKP4, BAZ2B, TANK, and DPP4 (Fig. 5a, bottom) . Breast cancer breaks near at least some of these genes interrupt functions essential for immunity and preventing cancer. For example, KYNU mediates the response to IFN-gamma. TANK is necessary for NFKB activation in the innate immune system. DPP4 is essential for preventing viral entry into cells. Reference gene functions in the breakpoint region of chromosome 12 (Fig. 6b, bottom) include vesicle trafficking (RASSF9), endocytosis (EEA1), blood cell formation (KITLG), interferon response control (SOCS2), and nerve cell patterning (NR2C1). Fig 6b, Potential retrovirus-like DNA sequences are different on chromosome 2 vs. chromosome 12. (Figs. 6a vs. 6b) . On chromosome 2, porcine endogenous retrovirus (PERV) (Denner, 2017) , human endogenous retrovirus (HERV) sequences (Prusty et al., 2008) , and a pseudogene (pHERV) also have significant homology to human DNA. The PERV-like sequence lies within a retroposed area on chromosome 2 within 28 Kbps 5' and 80 Kbps 3' of EBV sequences. EBVlike ends potentially generate homologies for retro-positioning and inserting PERV sequences. In contrast, significant retroviral sequence homologies are outside the breakpoint-rich stretch on chromosome 12 (Fig. 6b) . PERV and HERV variants (orange) may also contribute to breakpoints, but they distribute very differently on the two chromosome sections. Fig. 7 uses independent data to establish consistent relationships of breast cancer breaks to EBV genome anchor sites, precisely identified at disparate chromosome or gene locations (Lu et al., 2010) . A primary EBV genome binding site on chromosome 11 (Lu et al., 2010 ) matches a few breast cancer breakpoints. Another known breakpoint on chr1 near the CDC7 gene also corresponds to multiple breast cancer breaks. The possibility exists that BRCA1 and BRCA2 mutations are sufficient to cause chromosome breaks without contributions from EBV variants or other viruses. Breakpoints in breast cancers from 74 women over age 70 with no known hereditary BRCA1 and BRCA2 mutations (Nik-Zainal et al., 2016) were tested for relationships to hereditary breast cancer breakpoints. The female sporadic breast cancer patients are older than hereditary breast cancer patients, so mutations have had more time to accumulate, i.e., some base substitution signatures positively correlate with age (Alexandrov et al., 2020 ). Yet, inter-chromosomal translocation breakpoints also occur in sporadic cancers with normal BRCA1 and BRCA2 genes (Fig. 8) . Inter-chromosomal breakpoints tend to cluster in specific chromosome regions for individual breast cancers. Although there are significant differences in breakpoint distributions (Fig. 8) , many hereditary and sporadic breast cancer breakpoints cluster in similar areas of the identical chromosomes. Important potential confounders include promoter hypermethylation which provides an alternate method of inactivating BRCA1 and BRCA2 genes. BRCA1 and BRCA2 methylation is unusual in sporadic cancers because average methylation scores in 1538 sporadic breast cancers (Batra et al., 2021) were 0.050 and 0.004 for BRCA1 and BRCA2 promoters, respectively. Triple-negative breast cancer may be another significant confounder because up to 58.6% of 237 patients had a significant marker predictive of BRCA1/BRCA2 deficiency . However, triple-negative breast cancers comprised only 9% of breast cancers in the cohort. The breakage-fusion-bridge cycle is a catastrophe often related to telomeric dysfunction or a break near the end of a chromosome. During the generation of tumors, the cycle causes clustering of chromosome breakpoints and chromothripsis (Leibowitz et al., 2015; Umbreit et al., 2020) while telomere dysfunction promotes end-to-end fusions. For 74 breast cancers in women over age 70, sporadic cancer breakpoint positions were compared to breakpoints in 70 nasopharyngeal cancers as a model for EBV-mediated breakages. These analyses establish a robust and reliable association between sporadic breast cancer breaks and breakpoints in NPC. This association occurs without BRCA1 and BRCA2 gene mutations. The data (not shown) generally resembles the results for hereditary cancers in Fig.8 . Chromosome 11 may be a notable exception because sporadic breast cancer and NPC share 78% of breakpoints on chromosome 11. However, all breakpoints are less common in sporadic cancers than in hereditary cancers. Although EBV-positive NPC and gastric cancers have distinctive patterns of genes with DNA hypermethylation, some frequently DNA hypermethylated genes are shared. Chromosome 6p21.3 is a potential EBV infection signature, because the 6p21.3 region (Chr6:30,500,001-36,600,000) is hypermethylated in EBV-positive NPC and gastric cancer (Scott, 2017) . To determine if this potential marker is also hypermethylated in breast cancers, methylation data from 1538 breast cancers (Batra et al., 2021) was tested. Fig. 8 shows that this marker region in breast cancers has significant differences in promoter methylation vs. normal controls. Gene promoters on 6p21.3 inhibited by hypermethylation primarily control complement function, a system that integrates innate and adaptive immune responses against challenges from pathogens and abnormal cells. Moreover, most of the DNA breaks in this region are close to regions of homology to EBV sequences. The EBV tumor viruses (HKHD40 and HKNPC60) are typical of many other herpesvirus isolates, with some haplotypes conferring a high NPC risk (Xu et al., 2019a) . About 100 other gamma herpes viral variants strongly matched HKHD40 and HKNPC60 in regions with enough data to make comparisons possible. HKNPC60 is 99% identical to the EBV reference sequence at bases 1-7500 and 95% identical at bases 1,200,000-1,405,000. HKHD40 is 99% and 98% identity for comparisons to the same regions. Lu et al. found 4785 EBNA1 binding sites with over 50% overlapping potential fragile sites as a repetitive sequence element (Lu et al., 2010) . Kim et al. reported that EBNA1 anchor sites have A-T rich flanking sequences, with runs of consecutive A-T bases . Based on the fragile site database, chromosome 1 contains 658 fragile site genes, the most of any chromosome (Kumar et al., 2019) . Although some fragile sites align with breast cancer breaks on chromosome 1, large numbers of breaks on chromosomes 4, 12 (Fig. 10) and most other chromosomes are inconsistent with the fragile site database. On all the chromosomes tested, there were many more breakpoints than fragile sites. Breast cancer breaks do not consistently occur near common fragile sites (Fig. 10) . Some hereditary breast cancer breakpoints were tested for repeats likely to generate fragile sites because the repeats are difficult to replicate. This test did not find such sequences (supplementary Table S1 ). In contrast, many interchromosomal breaks are close to human EBV-like sequences. Sites of replication errors in even one cell can be a sudden catastrophe that cascades into further breaks, destabilizing the entire genome (Umbreit et al., 2020) . This cascade is especially likely in hereditary breast cancers with their deficits in homologous recombination repair. Most human cancer viruses merely initiate or promote cancer and are not sufficient to cause cancer by themselves. The present work shows human sequences related to tumor viral variants correlate with positions of chromosome breaks in breast cancers. Multiple independent lines of evidence support this view and are summarized below (1-12). 1. Breakpoints in both hereditary and sporadic breast cancers match breakpoints in NPC, a cancer mediated by EBV variants. This matching is true for every chromosome in females. A potential EBV methylation signature shared with known EBV cancers is far more abundant in 1538 breast cancers than in normal controls. 3. Breast cancer breakpoints are consistently near binding sites for EBV. 4. Every human chromosome in female breast cancers shows breakpoints that are close to EBV-like human DNA. EBV can also activate endogenous retroviral sequences, which also occur near some breakpoints. 5. The association of breast cancer breaks and EBV variants does not depend on the presence of BRCA1 or BRCA2 gene mutations. However breast cancer breaks are more abundant in BRCA1 and BRCA2 mutation carriers. 6. The association of EBV variant sequences with chromosome breakpoints does not require the continuing presence of active viruses anywhere within the resulting tumor. One breakpoint in a single cell can generate further breaks during cell division and destabilize the entire human genome. Showers of mutation also occur after illicit break repair. 7. Deficient immune responses in breast cancer tissue make it unduly susceptible to reactivation of exogenous EBV-like infections. The tissue also becomes unable to remove abnormal cells Some breakpoints directly disrupt essential immune response genes. 8. Fragile site sequences are not sufficient to account for breast cancer breaks 9. Tumor variants HKHD40 and HKNPC60 are herpesviruses closely related to known tumor virus populations KHSV and EBV. 10. EBV infection is nearly universal and can cause genomic instability well before breast cancer occurs. 11. EBV can also activate endogenous retroviral sequences, which also occur near some breakpoints. 12. Epidemiologic data shows active EBV infection increases breast cancer risk by about five-fold over controls. This estimate is probably conservative because it does not consider viral variants. Retroviruses make up 5-8% of our DNA. Some retroviruses have copied pieces of their DNA into the human genome within the last million years (Marchi et al., 2014) . The impact of retroviral incorporation on human disease is not settled. The sheer numbers of strong human-EBV variant homologies are surprising and might exceed retroviruses homologies. This similarity reflects the constant interplay between herpes viral DNA and human DNA during evolution. Some of these interactions are already recognized as causing severe disease. For example, a related herpes virus HHV-6A/6B integrates into telomeres on every chromosome. The viruses affect about 1% of humans (Tweedy et al., 2016) , causing severe disease on reactivation. Cytomegalovirus, another related herpes virus, causes birth defects. Only Fig. 11 summarizes a proposed model based on the breakage fusion bridge model of McClintock (McClintock, 1941) . Chromosomes from populations of EBV variants infecting an individual bind to human DNA at sequences matching variants. DNA then breaks. Under the best conditions, the break is quickly repaired by rejoining the two fragments. Viral binding prevents the break from quickly reforming, but EBV variants can also cause breaks more directly by producing nucleases (Wu et al., 2010) , promoting telomere dysfunction and replication stress (Hafez and Luftig, 2017) . The numbers of abnormal chromosome products quickly increase whether or not a second chromosome also breaks. Telomeres are missing so a second broken chromosome can link to the first chromosome, but the product now has two centromeres. During anaphase in cell division, the two centromeres try to separate, form a bridge and the chromosomes break again. The bridge region does not replicate normally during mitosis and chromothripsis becomes an integral part of the breakage-fusion-bridge mechanism (Umbreit et al., 2020) . The process now continues with or without viruses to destabilize the entire human Chromothripsis is thought to occur in concert (Umbreit et al., 2020) so that complexity of the population of abnormal chromosomes increases quickly. genome. BRCA1 and BRCA2 mutations increase the numbers of breaks, but the process also occurs in sporadic breast cancers, albeit less frequently. Fragments of chromosomes such as those generated during chromothripsis can also join chromosomes that are not protected by telomeres. Translocations can generate a burst of localized somatic mutations through the actions of APOBEC ("kataegis") (Nik-Zainal and Morganella, 2017) . APOBEC is typically a response to inactivate viral infections and effects related to APOBEC3 probably occur in EBV-induced carcinogenesis (Bobrovnitchaia et al., 2020; Law et al., 2020) . Gene regulation disruptions in breast cancers around translocation breakpoints could easily deregulate APOBEC3. There is selectivity among breast cancers in partners for improper repair. Partners for inter-chromosomal rearrangements and translocations are typically close to each other. An individual chromosome resides in its own spatial domain in the nucleus relative to other chromosomes (Leibowitz et al., 2015) . Interference from viral DNA adds to this spatial limitation to change translocation partners. EBV causes massive changes in the spatial distribution of chromosomes so that broken chromosome fragments have new nearby translocation partners. Nonetheless, spread plots of all chromosome breaks (data not shown) find multiple breakpoints shared among breast cancers. Figs 6a and 6b show human DNA homologies to SARS-CoV-2. Host factors are the primary determinant of the severity of SARS-CoV-2 infection . These areas of homology as in Fig. 6 together with deregulated immune responses, may add genomic host factors that influence the severity of CoV-2 infection. Multiple components of immune defenses mutate in hereditary breast cancer genomes. The mutations affect processes such as cytokine production, autophagy, etc. These functions depend on many genes dispersed throughout the genome, so any cancer needs only to damage one gene to cripple an immune function. Each breast cancer genome has a different set of these mutations, with the same gene only occasionally damaged (Friedenson, 2013; 2015) . Damage affecting the nervous system is also universal. Some herpes viruses establish occult infection within the central nervous system even after other sites become virus-free (Bhela et al., 2014) . Many breast cancer mutations also affect the nervous system, which increases damage from, herpes viral infection. The immune system and the nervous system communicate extensively. Neurotransmitters from parasympathetic and sympathetic neurons control immune activity. For example, TLR3 stimulation by viral infection triggers cytokine production from neurons that promotes immune responses. Injury, autoimmune conditions, hypoxia, and neurodegeneration activate immune cells in the central nervous system (Kioussis and Pachnis, 2009) In addition to gene defects in BRCA1 and BRCA2, other inherited gene defects increase susceptibility to EBV infection and EBV-driven diseases.. These inherited forms are associated with mutations in SH2D1A, ITK, MAGT1, CTPS1, CD27, CD70, CORO1A, and RASGRP1 (Latour and Winter, 2018) . Over 50% of patients with one of these defects experience EBVdriven lymphoproliferative disease including Hodgkin and non-Hodgkin lymphomas. Severe viral infections with other herpes viruses (CMV, HSV, HHV-6) are also common. The results of the present work are potentially actionable. The current evidence adds support for a childhood herpes vaccine and EBV antiviral treatment. The prospects for producing an EBV vaccine are promising, but the most appropriate targets are still not settled. Some immunotherapy strategies rely on augmenting the immune response, but this approach may need modification because mutations create additional holes in the immune response. Retroviruses and retrotransposons (Helman et al., 2014) may also participate in breast cancer breaks. Participation from porcine endogenous retroviruses is actionable by thoroughly cooking pork products. However, despite assertions that xenotransplantation with pig cells is safe, it is concerning that up to 6500 bps in human chromosome 11 are virtually identical to pig DNA (Fig. 5a) . Sampling the population to represent the breadth of all somatic and hereditary breast cancers is a significant problem. There is no assurance that even large numbers of breast cancers are an adequate representation because they are not a random sample from all breast cancers (Friedenson, 2009) . The data used here comes from 560 breast cancer genome sequences, familial cancer data from 78 patients, methylation data from 1538 breast cancers vs 244 controls, and 243 triple negative breast cancers (Batra et al., 2021; Nik-Zainal et al., 2016; Nones et al., 2019; Staaf et al., 2019) . The repertoire of mutational signatures in human cancer Basic local alignment search tool Epstein Barr Virus Associated Lymphomas and Epithelia Cancers in Humans Agespecific prevalence of Epstein-Barr virus infection among individuals aged 6-19 years in the United States and factors affecting its acquisition DNA methylation landscapes of 1538 breast cancers reveal a replication-linked clock, epigenomic instability and cis-regulation Critical role of microRNA-155 in herpes simplex encephalitis APOBEC-mediated DNA alterations: A possible new mechanism of carcinogenesis in EBV-positive gastric cancer InnateDB: systems biology of innate immunity and beyond--recent updates and continuing curation Whole-genome profiling of nasopharyngeal carcinoma reveals viral-host co-operation in inflammatory NF-kappaB activation and immune escape Identification of two distinct MYC breakpoint clusters and their association with various IGH breakpoint regions in the t(8;14) translocations in sporadic Burkitt-lymphoma Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade Biopython: freely available Python tools for computational molecular biology and bioinformatics /bioinformatics/btp163 Chromosomal Instability in Hodgkin Lymphoma: An In-Depth Review and Perspectives 3390/cancers10040091 The porcine virome and xenotransplantation Human Herpesvirus 6 and Malignancy: A Review Frequency and genome load of Epstein-Barr virus in 509 breast cancers from different geographical areas Dewey defeats Truman and cancer statistics Mutations in components of antiviral or microbial defense as a basis for breast cancer Mutations in Breast Cancer Exome Sequences Predict Susceptibility to Infections and Converge on the Oncogenic Properties of the EBV ZEBRA Protein. Cancers (Basel) 12. 10.3390/cancers12061479 Targeting Epstein-Barr Virus in Nasopharyngeal Carcinoma Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing Epstein-Barr Virus Infection of Mammary Epithelial Cells Promotes Malignant Transformation Epigenetic specifications of host chromosome docking sites for latent Epstein-Barr virus Immune and nervous systems: more than just a superficial similarity HumCFS: a database of fragile sites in human chromosomes Inherited Immunodeficiencies With High Predisposition to Epstein-Barr Virus-Driven Lymphoproliferative Diseases APOBEC3A catalyzes mutation and drives carcinogenesis in vivo Catching viral breast cancer Chromothripsis: A New Mechanism for Rapid Karyotype Evolution Characterization of Epstein Barr virus latency pattern in Argentine breast carcinoma Genome-wide analysis of host-chromosome binding sites for Epstein-Barr Virus Nuclear Antigen 1 (EBNA1) InnateDB: facilitating systems-level analyses of the mammalian innate immune response Impaired Replication Timing Promotes Tissue-Specific Expression of Common Fragile Sites 3390/genes11030326 Unfixed endogenous retroviral insertions in the human population Epstein-Barr virus infection and clinical outcome in breast cancer patients correlate with immune cell TNF-alpha/IFN-gamma response The stability of broken ends of chromosomes in Zea Mays Using the Basic Local Alignment Search Tool (BLAST) Constitutive activation of NF-kappaB during progression of breast cancer to hormone-independent growth A general method applicable to search for similarities in the amino acid sequence of two proteins Landscape of somatic mutations in 560 breast cancer whole-genome sequences Author Correction: Landscape of somatic mutations in 560 breast cancer whole-genome sequences Mutational Signatures in Breast Cancer: The Problem at the DNA Level. Clinical cancer research : an official journal of the American Association for Whole-genome sequencing reveals clinically relevant insights into the aetiology of familial breast cancers ImmTree: database of evolutionary relationships of genes and proteins in the human immune system Molecular characterization of the immune system: emergence of proteins, processes, and domains Inherited Chromosomally Integrated Human Herpesvirus 6 Demonstrates Tissue-Specific RNA Expression In Vivo That Correlates with an Increased Antibody Immune Response Multiplex PCR/mass spectrometry screening of biological carcinogenic agents in human mammary tumors Transcription of HERV-E and HERV-E-related sequences in malignant and non-malignant human haematopoietic cells Epstein-Barr virus: a master epigenetic manipulator Is EBV Associated with Breast Cancer in Specific Geographic Locations? Cancers (Basel) Whole-genome sequencing of triple-negative breast cancers in a population-based clinical study Major chromosomal breakpoint intervals in breast cancer co-localize with differentially methylated regions virus single-stranded DNA-binding protein: purification, characterization, and action on DNA synthesis by the viral DNA polymerase Complete Genome Sequence of Germline Chromosomally Integrated Human Herpesvirus 6A and Analyses Integration Sites Define a New Human Endogenous Virus with Potential to Reactivate as an Emerging Infection Mechanisms generating cancer genome complexity from a single cell division error Visualizing genomic information across chromosomes with PhenoGram Epstein-Barr virus DNase (BGLF5) induces genomic instability in human epithelial cells Genomewide Analysis of Epstein-Barr Virus (EBV) Integration and Strain in C666-1 and Raji Cells Genome sequencing analysis identifies Epstein-Barr virus subtypes associated with high risk of nasopharyngeal carcinoma Genome-wide profiling of Epstein-Barr virus integration by targeted sequencing in Epstein-Barr virus associated malignancies Viral and host factors related to the clinical outcome of COVID-19 A greedy algorithm for aligning DNA sequences Supplementary Table S1 Absence of inverted repeats at breakpoints in BRCA2 associated breast cancers. Chromosome coordinates for breaks vs nearby unbroken sequences were assayed for repeats within 100 base pairs in either direction using Breakpoint or non-Breakpoint