key: cord-0275678-ihevywu0 authors: Bowyer, Paul; Currin, Andrew; Delneri, Daniela; Fraczek, Marcin G. title: Telomere to telomere sequence of model Aspergillus fumigatus genomes date: 2022-03-27 journal: bioRxiv DOI: 10.1101/2022.03.26.485923 sha: 26ffba2e120dcefb663bfeadb8afd698d6d5d5d6 doc_id: 275678 cord_uid: ihevywu0 The pathogenic fungus Aspergillus fumigatus is a major etiological agent of fungal invasive and chronic diseases affecting tens of millions of individuals worldwide. A high-quality reference genome is a fundamental resource to study its biology, pathogenicity and virulence as well as to discover better and more effective treatments against diseases caused by this fungus. Here, we used PacBio Single Molecule Real-Time (SMRT) and Oxford Nanopore sequencing for de novo genome assembly of two laboratory reference strains of A. fumigatus,CEA10 and A1160. We generated full length chromosome assemblies and a comprehensive telomere to telomere coverage for these two strains including ribosomal repeats and the sequences of centromeres, which we discovered to be composed of long transposon elements. Aspergillus fumigatus causes over 11 million allergic and over 3 million chronic and invasive 28 lung infections annually, representing a significant complication of profound 29 immunosuppression, chronic obstructive pulmonary disease (COPD), severe viral respiratory 30 infections (such as influenza or and many other pre-existing conditions (1-4). 31 Mortality rates with effective treatment for invasive disease remain ~50% (5) and >80% for 32 individuals infected with drug resistant isolates (6). The availability of genome sequence has 33 underpinned many of the rapid advances in our understanding of this organism in recent years. 34 35 The first A. fumigatus genome sequence was published in 2005 (7) for a clinical isolate Af293, 36 followed by the A1163 strain published in 2008 (8). These two reference genome sequences 37 have been crucial to study the biology and pathogenicity of this fungus. However, due to the 38 technological capabilities at the time, the original reference sequences are not complete and 39 contain several sequence blocks of insertions and duplications, some sequences are absent 40 (deletions) or unknown nucleotides (NNN) are present. Moreover, these sequences lack 41 coverage of centromeres, the accurate sequence for the ribosomal repeats, and a comprehensive 42 annotation of chromosomal rearrangements such as translocations and inversions. In particular, 43 A1163 was not sequenced at sufficient depth to perform a full assembly and remains as poorly 44 organised contigs and scaffolds (8). A1163 or strains derived from its parental isolate CEA10 45 (9, 10) have become standard in laboratory experiments because of their robust pathogenicity 46 and growth. For example, the CEA10 descendant isolate A1160, recently renamed to MFIG001 47 Our data show that the genomes of A1160 and CEA10 are almost identical in sequence besides 94 a small number of SNP variations (96) in several genes (Supplementary Dataset 2). The most 95 evident changes in the SNPs are observed on chromosome 8, for which we also observed 96 several insertions and deletions (INDELs) of nucleotides, leading to frame shift. There is a total 97 of 34 INDELs between these 2 strains. For the strain A1160 the telomere on chromosome 6 98 could not be completely assembled due to chromosomal rearrangements. 99 100 Ribosomal sequence was extracted from the raw data using grep to capture reads known to 101 contain A. fumigatus ribosomal sequences. For Oxford Nanopore data, assembled repeat 102 regions were obtained as assembled contigs. The core assembly indicated only a single 28S 103 repeat and this is likely due to mis-assembly of the repeat units. As the number of repeats is 104 not clearly distinguishable, the 28S segment was left as a marker for the region on chromosome The mitochondrial sequences of both species were also analysed, and we found that our 109 assembled data are consistent with previously published sequences for A1160 and Af293 (19) . 110 The new genome assembly unravel previously undetected gene sequences and 112 chromosomal rearrangements 113 The original sequence of Af293 was created in 2005 using the whole genome random 114 sequencing method (7). Although, it still provides crucial sequencing data, it does not include 115 centromeres or chromosomal rearrangements. In table 1 we summarise the predicted sizes of 116 chromosomes and genes from our PacBio analysis for A1160 and CEA10 and compare them 117 to the sizes present in the database for Af293. Two different pipelines were used in this analysis 118 revealing no major differences in the sizes between the previously generated reference 119 sequences and our newly assembled genomes. As previously shown (7), the genome o A. 120 fumigatus is arranged in 8 chromosomes of a total of approximately 29.2 Mb. However, 121 compared to the previously sequenced Af293, we found that A1160 and CEA10 have 122 approximately 300 more genes (Table 1) , which are the result of more accurate assembly. 123 124 125 Protein coding gene transcripts, short ncRNAs, tRNAs as well as transposons were annotated 133 based on our de novo analysis and the data from FungiDB ( Fig. 1 ; Supplementary Dataset 5 134 and 6). When determining centromere localisation, we observed that transposable elements, 135 besides being scattered throughout the whole genome as predicted, they were also localised in 136 the centromeres of all 8 chromosomes, forming the majority of centromeric sequences. 137 Although, it was previously predicted that centromeres of filamentous fungi may be composed 138 of transposons (20), our study is the first to confirm that the centromeres of A. fumigatus 139 chromosomes are enriched with transposable elements. An example of a detailed chromosomal 140 annotation is presented in Fig. 2 . 141 142 Our sequencing data also confirmed the localisation of the native ku80 gene deletion in CEA10 143 (9) as well as the replacement of this gene in A1160 with pyrG + on chromosome 2 (12) (Fig. 144 3). 145 The comparison between the genomes of the reference strain Af293 and sequenced 147 CEA10/A1160 revealed a number of chromosomal rearrangements ( Fig. 4A and B) . The 148 largest rearrangements are between the ends of chromosomes 1 and 6 (a situation previously 149 suggested in the original A1163 sequencing (8)). Chromosomal rearrangements and 150 chromosomal breakpoint usage have been proposed to play a significant role in evolution that 151 lead to environmental adaptation and these events have been previously observed in filamentous fungi (21-23). As both A1160 and CEA10 strains have been widely used for >20 153 years, it is expected that they might have accrued mutations and chromosomal rearrangements. 154 The availability of comprehensive genome sequence of A. fumigatus strains is crucial to 157 understand the biology, pathogenicity and virulence of this fungus. Moreover, quality genome 158 sequences are proving to be a powerful method for discovering mechanisms of drug resistance 159 and may lead to more efficient patient treatment and their recovery. Here, we provide the 160 comprehensive, telomere to telomere genome sequences of two widely used isolates of A. Two strains of A. fumigatus, CEA10 and A1160 (10, 12) were used in this study. Fungal spores 170 were used to extract high quality genomic DNA following a previously described CTAB 171 method (12) with few modifications that greatly improved the quality and purity of extracted 172 DNA. Briefly, both isolates were grown on SAB agar media in tissue culture flasks to minimise 173 cross-contamination and spores were harvested in PBS/Tween20 and transferred to 2 ml screw 174 top tubes containing 425-600 mm washed glass beads (filled to the 300 mL mark; ~50 mg) 175 (Merck). Spores were centrifuged at max speed for 2 min using a benchtop centrifuge and the 176 supernatant was removed. 1 mL of CTAB extraction buffer (2% CTAB, 100 mM Tris, 1.4 M 177 NaCl and 10 mM EDTA, pH 8.0) was added and the tubes and they were vortexed at max 178 speed for 10 minutes. Subsequently, the tubes were incubated for 10 min at 65°C. Then, the 179 above vortexing and heating process was repeated, and tubes were centrifuged at max speed 180 for 2 minutes. The supernatant was transferred to new 2 ml tubes and an equal volume of 181 chloroform was added. Tubes were mixed by inversion and centrifuged for 3 minutes at max 182 speed. The aqueous phase was transferred to new 1.5 mL tubes and DNA was precipitated by 183 addition of 0.6 volumes of isopropyl alcohol. Following centrifugation for 2 minutes at max 184 speed, the supernatant was decanted, and the pellet was washed with 0.5 mL absolute ethanol. 185 The pellet was briefly air-dried and resuspended in 200 µL of dH2O. Subsequently, 2 µl of 100 mg/mL RNase A (Qiagen) was added and the tubes were incubated at 37°C for 15 minutes. 187 Then, 1 mL of buffer PB or PM (Qiagen), containing a high concentration of guanidine 188 hydrochloride and isopropanol was added and mixed by pipetting. The solution was transferred 189 onto silica based blue columns (NBS biologicals) and centrifuged for 30 seconds at max speed. 190 Then, 700 µL of buffer PE (Qiagen) was added onto the column and centrifuged as above 191 followed by additional spinning for 1 minute at max speed. The DNA was eluted in 100 µL of 192 dH2O and the quality of the DNA was assessed on a 1% agarose gel, as well as using a nanodrop 193 (Thermofisher Scientific) and a Qubit 4 Fluorometer (Thermofisher Scientific). Assembly Process (HGAP4) was used, with 30x seed coverage specified for each assembly 217 with specified genome length of 29 Mb (all other parameters were unchanged). Assembly 218 polishing and resequencing was performed using the Resequencing algorithm in SMRT Link 219 8.0. 220 For Oxford Nanopore data, base calling was performed using Guppy (Oxford Nanopore) and 222 de novo assembly was performed using Canu 1.9 (24), with specified genome length of 29 Mb. 223 224 For CEA10 PacBio and Nanopore sequence assemblies were then polished using 3 rounds of 225 PILON (25) with 2 paired end Illumina 2x150 fastqs to give the final CEA10 sequence. 226 227 Annotation 228 Genomes were subjected to a cursory annotation using a Genemark EP+ pipeline (26) guided 229 by Prothint 2.5.0 using orthodb version 10.1 as previously described (27) Action For Fungal Infections Pathophysiological aspects of Aspergillus 251 colonization in disease COVID-19-Associated Pulmonary Aspergillosis in an Immunocompetent Host: A Case 254 Aspergillus fumigatus and pan-azole resistance: 256 who should be concerned? Hidden 258 killers: human fungal infections Emergence of azole-resistant invasive 261 aspergillosis in HSCT recipients in Germany Genomic sequence of 269 the pathogenic and allergenic filamentous fungus Aspergillus fumigatus Genomic islands in the pathogenic filamentous 277 fungus Aspergillus fumigatus Virulence of 279 alkaline protease-deficient mutants of Aspergillus fumigatus On the lineage of Aspergillus fumigatus isolates in common laboratory use The akuB(KU80) mutant deficient for 286 nonhomologous end joining is a powerful tool for analyzing pathogenicity in 287 Aspergillus fumigatus The cdr1B efflux transporter is associated with non-290 cyp51a-mediated itraconazole resistance in Aspergillus fumigatus 293 Fast and Reliable PCR Amplification from Aspergillus fumigatus Spore Suspension 294 Without Traditional DNA Extraction High-Throughput Gene Replacement in 297 Aspergillus fumigatus The negative cofactor 2 complex is a key regulator of drug 302 resistance in Aspergillus fumigatus Reveals Transposable Elements as a Key Contributor to Genomic Plasticity and 306 Virulence Variation in Magnaporthe oryzae Linking 311 secondary metabolites to gene clusters through genome sequencing of six diverse 312 Aspergillus species Draft Genome 314 Sequence of Aspergillus awamori IFM 58123(NT). Microbiol Resour Announc 8 Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species 318 identifies mobile introns and accessory genes as main sources of genome size 319 variability Centromeres of 321 filamentous fungi Evolution, selection and isolation: a genomic view of speciation 323 in fungal plant pathogens Comparative Genomics of Aspergillus flavus 325 S and L Morphotypes Yield Insights into Niche Adaptation. G3 (Bethesda) Sequence breakpoints in the aflatoxin 328 biosynthesis gene cluster and flanking regions in nonaflatoxigenic Aspergillus flavus 329 isolates Canu: 331 scalable and accurate long-read assembly via adaptive k-mer weighting and repeat 332 separation Pilon: an integrated tool for comprehensive 335 microbial variant detection and genome assembly improvement GeneMark-EP+: eukaryotic gene 337 prediction with self-training in the space of genes and proteins OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, 341 bacterial and viral genomes for evolutionary and functional annotations of orthologs BRAKER2: 344 automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS 345 supported by a protein database Automated generation of heuristics for biological sequence 347 comparison