Carrel name: keyword-genome-cord Creating study carrel named keyword-genome-cord Initializing database file: cache/cord-000556-uu1oz2ei.json key: cord-000556-uu1oz2ei authors: Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu title: RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” date: 2012-01-20 journal: PLoS One DOI: 10.1371/journal.pone.0029435 sha: doc_id: 556 cord_uid: uu1oz2ei file: cache/cord-000012-p56v8wi1.json key: cord-000012-p56v8wi1 authors: Bigot, Yves; Samain, Sylvie; Augé-Gouillou, Corinne; Federici, Brian A title: Molecular evidence for the evolution of ichnoviruses from ascoviruses by symbiogenesis date: 2008-09-18 journal: BMC Evol Biol DOI: 10.1186/1471-2148-8-253 sha: doc_id: 12 cord_uid: p56v8wi1 file: cache/cord-004123-1s8kuno2.json key: cord-004123-1s8kuno2 authors: Jaiswal, Arun Kumar; Tiwari, Sandeep; Jamal, Syed Babar; de Castro Oliveira, Letícia; Alves, Leandro Gomes; Azevedo, Vasco; Ghosh, Preetam; Oliveira, Carlo Jose Freira; Soares, Siomar C. title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis date: 2020-01-10 journal: BMC Genomics DOI: 10.1186/s12864-019-6430-6 sha: doc_id: 4123 cord_uid: 1s8kuno2 file: cache/cord-012473-p66of6kq.json key: cord-012473-p66of6kq authors: Celniker, Susan E.; Dillon, Laura A. L.; Gerstein, Mark B.; Gunsalus, Kristin C.; Henikoff, Steven; Karpen, Gary H.; Kellis, Manolis; Lai, Eric C.; Lieb, Jason D.; MacAlpine, David M.; Micklem, Gos; Piano, Fabio; Snyder, Michael; Stein, Lincoln; White, Kevin P.; Waterston, Robert H. title: Unlocking the secrets of the genome date: 2009-06-17 journal: Nature DOI: 10.1038/459927a sha: doc_id: 12473 cord_uid: p66of6kq file: cache/cord-018804-wj35q88f.json key: cord-018804-wj35q88f authors: Lázaro, Ester title: Genetic Variability in RNA Viruses: Consequences in Epidemiology and in the Development of New Stratgies for the Extinction of Infectivity date: 2007 journal: Structural Approaches to Sequence Evolution DOI: 10.1007/978-3-540-35306-5_15 sha: doc_id: 18804 cord_uid: wj35q88f file: cache/cord-018437-yjvwa1ot.json key: cord-018437-yjvwa1ot authors: Mitchell, Michael title: Taxonomy date: 2013-08-26 journal: Viruses and the Lung DOI: 10.1007/978-3-642-40605-8_3 sha: doc_id: 18437 cord_uid: yjvwa1ot file: cache/cord-264746-gfn312aa.json key: cord-264746-gfn312aa authors: Muse, Spencer title: GENOMICS AND BIOINFORMATICS date: 2012-03-29 journal: Introduction to Biomedical Engineering DOI: 10.1016/b978-0-12-238662-6.50015-x sha: doc_id: 264746 cord_uid: gfn312aa file: cache/cord-016588-f8uvhstb.json key: cord-016588-f8uvhstb authors: Sintchenko, Vitali title: Informatics for Infectious Disease Research and Control date: 2009-10-03 journal: Infectious Disease Informatics DOI: 10.1007/978-1-4419-1327-2_1 sha: doc_id: 16588 cord_uid: f8uvhstb file: cache/cord-267714-ji88tvsl.json key: cord-267714-ji88tvsl authors: JAKUPCIAK, JOHN P.; COLWELL, RITA R. title: Biological agent detection technologies date: 2009-04-21 journal: Mol Ecol Resour DOI: 10.1111/j.1755-0998.2009.02632.x sha: doc_id: 267714 cord_uid: ji88tvsl file: cache/cord-014461-2ubh9u8r.json key: cord-014461-2ubh9u8r authors: Nelson, Oranmiyan W.; Garrity, George M. title: Genome sequences published outside of Standards in Genomic Sciences, July - October 2012 date: 2012-10-10 journal: Stand Genomic Sci DOI: 10.4056/sigs.3416907 sha: doc_id: 14461 cord_uid: 2ubh9u8r file: cache/cord-265329-bsypo08l.json key: cord-265329-bsypo08l authors: van Dorp, Lucy; Acman, Mislav; Richard, Damien; Shaw, Liam P.; Ford, Charlotte E.; Ormond, Louise; Owen, Christopher J.; Pang, Juanita; Tan, Cedric C.S.; Boshier, Florencia A.T.; Ortiz, Arturo Torres; Balloux, François title: Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 date: 2020-05-05 journal: Infect Genet Evol DOI: 10.1016/j.meegid.2020.104351 sha: doc_id: 265329 cord_uid: bsypo08l file: cache/cord-269124-oreg7rnj.json key: cord-269124-oreg7rnj authors: Spyrou, Maria A.; Bos, Kirsten I.; Herbig, Alexander; Krause, Johannes title: Ancient pathogen genomics as an emerging tool for infectious disease research date: 2019-04-05 journal: Nat Rev Genet DOI: 10.1038/s41576-019-0119-1 sha: doc_id: 269124 cord_uid: oreg7rnj file: cache/cord-265581-pbv8mjfc.json key: cord-265581-pbv8mjfc authors: Tong, Yaojun; Deng, Zixin title: An aurora of natural products-based drug discovery is coming date: 2020-06-06 journal: Synth Syst Biotechnol DOI: 10.1016/j.synbio.2020.05.003 sha: doc_id: 265581 cord_uid: pbv8mjfc file: cache/cord-265857-fs6dj3dp.json key: cord-265857-fs6dj3dp authors: Liu, Yu-Tsueng title: Infectious Disease Genomics date: 2010-12-24 journal: Genetics and Evolution of Infectious Disease DOI: 10.1016/b978-0-12-384890-1.00010-8 sha: doc_id: 265857 cord_uid: fs6dj3dp file: cache/cord-000902-ew8orn0z.json key: cord-000902-ew8orn0z authors: Zhao, Xiangyan; Tian, Yonglei; Yang, Ronghua; Feng, Haiping; Ouyang, Qingjian; Tian, You; Tan, Zhongyang; Li, Mingfu; Niu, Yile; Jiang, Jianhui; Shen, Guoli; Yu, Ruqin title: Coevolution between simple sequence repeats (SSRs) and virus genome size date: 2012-08-30 journal: BMC Genomics DOI: 10.1186/1471-2164-13-435 sha: doc_id: 902 cord_uid: ew8orn0z file: cache/cord-001340-kqcx7lrq.json key: cord-001340-kqcx7lrq authors: Ladner, Jason T.; Beitzel, Brett; Chain, Patrick S. G.; Davenport, Matthew G.; Donaldson, Eric; Frieman, Matthew; Kugelman, Jeffrey; Kuhn, Jens H.; O’Rear, Jules; Sabeti, Pardis C.; Wentworth, David E.; Wiley, Michael R.; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher; Palacios, Gustavo title: Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing date: 2014-06-17 journal: mBio DOI: 10.1128/mbio.01360-14 sha: doc_id: 1340 cord_uid: kqcx7lrq file: cache/cord-007923-j3jpqd7k.json key: cord-007923-j3jpqd7k authors: O'Brien, Stephen J. title: Cats date: 2004-12-14 journal: Curr Biol DOI: 10.1016/j.cub.2004.11.017 sha: doc_id: 7923 cord_uid: j3jpqd7k file: cache/cord-005281-wy0zk9p8.json key: cord-005281-wy0zk9p8 authors: Blinov, V. M.; Zverev, V. V.; Krasnov, G. S.; Filatov, F. P.; Shargunov, A. V. title: Viral component of the human genome date: 2017-05-09 journal: Mol Biol DOI: 10.1134/s0026893317020066 sha: doc_id: 5281 cord_uid: wy0zk9p8 file: cache/cord-324811-yjwavea5.json key: cord-324811-yjwavea5 authors: Kidgell, Claire; Winzeler, Elizabeth A. title: Elucidating genetic diversity with oligonucleotide arrays date: 2005 journal: Chromosome Res DOI: 10.1007/s10577-005-1503-6 sha: doc_id: 324811 cord_uid: yjwavea5 file: cache/cord-017932-vmtjc8ct.json key: cord-017932-vmtjc8ct authors: Georgiev, Vassil St. title: Genomic and Postgenomic Research date: 2009 journal: National Institute of Allergy and Infectious Diseases, NIH DOI: 10.1007/978-1-60327-297-1_25 sha: doc_id: 17932 cord_uid: vmtjc8ct file: cache/cord-348059-wa1gjbck.json key: cord-348059-wa1gjbck authors: Gibbs, Richard A. title: The Human Genome Project changed everything date: 2020-08-07 journal: Nat Rev Genet DOI: 10.1038/s41576-020-0275-3 sha: doc_id: 348059 cord_uid: wa1gjbck file: cache/cord-304607-td0776wj.json key: cord-304607-td0776wj authors: Paszkiewicz, Konrad H.; Giezen, Mark van der title: Omics, Bioinformatics, and Infectious Disease Research date: 2010-12-24 journal: Genetics and Evolution of Infectious Disease DOI: 10.1016/b978-0-12-384890-1.00018-2 sha: doc_id: 304607 cord_uid: td0776wj file: cache/cord-281959-g4sjyytr.json key: cord-281959-g4sjyytr authors: Phillippy, Adam M; Deng, Xiangyu; Zhang, Wei; Salzberg, Steven L title: Efficient oligonucleotide probe selection for pan-genomic tiling arrays date: 2009-09-16 journal: BMC Bioinformatics DOI: 10.1186/1471-2105-10-293 sha: doc_id: 281959 cord_uid: g4sjyytr file: cache/cord-016798-tv2ntug6.json key: cord-016798-tv2ntug6 authors: Gautam, Ablesh; Tiwari, Ashish; Malik, Yashpal Singh title: Bioinformatics Applications in Advancing Animal Virus Research date: 2019-06-06 journal: Recent Advances in Animal Virology DOI: 10.1007/978-981-13-9073-9_23 sha: doc_id: 16798 cord_uid: tv2ntug6 file: cache/cord-016293-pyb00pt5.json key: cord-016293-pyb00pt5 authors: Newell-McGloughlin, Martina; Re, Edward title: The flowering of the age of Biotechnology 1990–2000 date: 2006 journal: The Evolution of Biotechnology DOI: 10.1007/1-4020-5149-2_4 sha: doc_id: 16293 cord_uid: pyb00pt5 file: cache/cord-301709-kvyes2lz.json key: cord-301709-kvyes2lz authors: Baker, Susan C.; Jukneliene, Dalia; Purkayastha, Anjan; Snyder, Eric E.; Crasta, Oswald R.; Czar, Michael J.; Setubal, Joao C.; Sobral, Bruno W. title: Developing Bioinformatic Resources for Coronaviruses date: 2006 journal: The Nidoviruses DOI: 10.1007/978-0-387-33012-9_70 sha: doc_id: 301709 cord_uid: kvyes2lz file: cache/cord-298136-mel9fxw8.json key: cord-298136-mel9fxw8 authors: O'Malley, Maureen A.; Bostanci, Adam; Calvert, Jane title: Whole-genome patenting date: 2005-05-10 journal: Nat Rev Genet DOI: 10.1038/nrg1613 sha: doc_id: 298136 cord_uid: mel9fxw8 file: cache/cord-268795-tjmx6msm.json key: cord-268795-tjmx6msm authors: Sardar, Rahila; Satish, Deepshikha; Birla, Shweta; Gupta, Dinesh title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date: 2020-03-21 journal: bioRxiv DOI: 10.1101/2020.03.21.001586 sha: doc_id: 268795 cord_uid: tjmx6msm file: cache/cord-003316-r5te5xob.json key: cord-003316-r5te5xob authors: Balloux, Francois; Brønstad Brynildsrud, Ola; van Dorp, Lucy; Shaw, Liam P.; Chen, Hongbin; Harris, Kathryn A.; Wang, Hui; Eldholm, Vegard title: From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic date: 2018-12-17 journal: Trends Microbiol DOI: 10.1016/j.tim.2018.08.004 sha: doc_id: 3316 cord_uid: r5te5xob file: cache/cord-277687-u3q36o3e.json key: cord-277687-u3q36o3e authors: Shean, Ryan C.; Makhsous, Negar; Stoddard, Graham D.; Lin, Michelle J.; Greninger, Alexander L. title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank date: 2019-01-23 journal: BMC Bioinformatics DOI: 10.1186/s12859-019-2606-y sha: doc_id: 277687 cord_uid: u3q36o3e file: cache/cord-348515-bqqyly23.json key: cord-348515-bqqyly23 authors: Zhao, Suhui; Wan, Chengsong; Ke, Changwen; Seto, Jason; Dehghan, Shoaleh; Zou, Lirong; Zhou, Jie; Cheng, Zetao; Jing, Shuping; Zeng, Zhiwei; Zhang, Jing; Wan, Xuan; Wu, Xianbo; Zhao, Wei; Zhu, Li; Seto, Donald; Zhang, Qiwei title: Re-emergent Human Adenovirus Genome Type 7d Caused an Acute Respiratory Disease Outbreak in Southern China After a Twenty-one Year Absence date: 2014-12-08 journal: Sci Rep DOI: 10.1038/srep07365 sha: doc_id: 348515 cord_uid: bqqyly23 file: cache/cord-320005-i30t7cvr.json key: cord-320005-i30t7cvr authors: Pardo, A. title: The Human Genome and Advances in Medicine: Limits and Future Prospects date: 2004-03-31 journal: Archivos de Bronconeumología ((English Edition)) DOI: 10.1016/s1579-2129(06)70078-7 sha: doc_id: 320005 cord_uid: i30t7cvr file: cache/cord-302047-vv5gpldi.json key: cord-302047-vv5gpldi authors: Willemsen, Anouk; Zwart, Mark P title: On the stability of sequences inserted into viral genomes date: 2019-11-14 journal: Virus Evol DOI: 10.1093/ve/vez045 sha: doc_id: 302047 cord_uid: vv5gpldi file: cache/cord-350747-5t5xthk6.json key: cord-350747-5t5xthk6 authors: Gmyl, A. P.; Agol, V. I. title: Diverse Mechanisms of RNA Recombination date: 2005 journal: Mol Biol DOI: 10.1007/s11008-005-0069-x sha: doc_id: 350747 cord_uid: 5t5xthk6 file: cache/cord-015850-ef6svn8f.json key: cord-015850-ef6svn8f authors: Saitou, Naruya title: Eukaryote Genomes date: 2013-08-22 journal: Introduction to Evolutionary Genomics DOI: 10.1007/978-1-4471-5304-7_8 sha: doc_id: 15850 cord_uid: ef6svn8f file: cache/cord-310406-5pvln91x.json key: cord-310406-5pvln91x authors: Asbury, Thomas M; Mitman, Matt; Tang, Jijun; Zheng, W Jim title: Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome date: 2010-09-02 journal: BMC Bioinformatics DOI: 10.1186/1471-2105-11-444 sha: doc_id: 310406 cord_uid: 5pvln91x file: cache/cord-340423-f8ab7413.json key: cord-340423-f8ab7413 authors: Barr, J.N.; Fearns, R. title: Genetic Instability of RNA Viruses date: 2016-09-09 journal: Genome Stability DOI: 10.1016/b978-0-12-803309-8.00002-1 sha: doc_id: 340423 cord_uid: f8ab7413 file: cache/cord-275683-1qj9ri18.json key: cord-275683-1qj9ri18 authors: Roux, Simon; Matthijnssens, Jelle; Dutilh, Bas E. title: Metagenomics in Virology date: 2019-06-12 journal: Reference Module in Life Sciences DOI: 10.1016/b978-0-12-809633-8.20957-6 sha: doc_id: 275683 cord_uid: 1qj9ri18 file: cache/cord-330312-1pjolkql.json key: cord-330312-1pjolkql authors: Liu, Y.-T. title: Infectious Disease Genomics date: 2017-01-20 journal: Genetics and Evolution of Infectious Diseases DOI: 10.1016/b978-0-12-799942-5.00010-x sha: doc_id: 330312 cord_uid: 1pjolkql file: cache/cord-297669-22fctxk4.json key: cord-297669-22fctxk4 authors: Proudfoot, Chris; Lillico, Simon; Tait-Burkard, Christine title: Genome editing for disease resistance in pigs and chickens date: 2019-06-25 journal: Anim Front DOI: 10.1093/af/vfz013 sha: doc_id: 297669 cord_uid: 22fctxk4 file: cache/cord-022128-r8el8nqm.json key: cord-022128-r8el8nqm authors: Domingo, Esteban title: Molecular basis of genetic variation of viruses: error-prone replication date: 2019-11-08 journal: Virus as Populations DOI: 10.1016/b978-0-12-816331-3.00002-7 sha: doc_id: 22128 cord_uid: r8el8nqm file: cache/cord-304498-ty41xob0.json key: cord-304498-ty41xob0 authors: Denison, Mark R; Graham, Rachel L; Donaldson, Eric F; Eckerle, Lance D; Baric, Ralph S title: Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity date: 2011-03-01 journal: RNA Biology DOI: 10.4161/rna.8.2.15013 sha: doc_id: 304498 cord_uid: ty41xob0 file: cache/cord-314594-xvc8hvpq.json key: cord-314594-xvc8hvpq authors: Singh, Roshan Kumar; Prasad, Ashish; Muthamilarasan, Mehanathan; Parida, Swarup K.; Prasad, Manoj title: Breeding and biotechnological interventions for trait improvement: status and prospects date: 2020-09-18 journal: Planta DOI: 10.1007/s00425-020-03465-4 sha: doc_id: 314594 cord_uid: xvc8hvpq file: cache/cord-346335-el45v0a5.json key: cord-346335-el45v0a5 authors: Tan, H.S. title: Fourier spectral density of the coronavirus genome date: 2020-08-11 journal: bioRxiv DOI: 10.1101/2020.06.30.180034 sha: doc_id: 346335 cord_uid: el45v0a5 file: cache/cord-318392-r9bbomvk.json key: cord-318392-r9bbomvk authors: Woo, Patrick CY; Lau, Susanna KP; Tsang, Chi-Ching; Lau, Candy CY; Wong, Po-Chun; Chow, Franklin WN; Fong, Jordan YH; Yuen, Kwok-Yung title: Coronavirus HKU15 in respiratory tract of pigs and first discovery of coronavirus quasispecies in 5′-untranslated region date: 2017-06-21 journal: Emerg Microbes Infect DOI: 10.1038/emi.2017.37 sha: doc_id: 318392 cord_uid: r9bbomvk file: cache/cord-334394-qgyzk7th.json key: cord-334394-qgyzk7th authors: Edgar, Robert C.; Taylor, Jeff; Altman, Tomer; Barbera, Pierre; Meleshko, Dmitry; Lin, Victor; Lohr, Dan; Novakovsky, Gherman; Al-Shayeb, Basem; Banfield, Jillian F.; Korobeynikov, Anton; Chikhi, Rayan; Babaian, Artem title: Petabase-scale sequence alignment catalyses viral discovery date: 2020-08-10 journal: bioRxiv DOI: 10.1101/2020.08.07.241729 sha: doc_id: 334394 cord_uid: qgyzk7th file: cache/cord-316033-xg8eb2nm.json key: cord-316033-xg8eb2nm authors: Easton, Alice; Gao, Shenghan; Lawton, Scott P; Bennuru, Sasisekhar; Khan, Asis; Dahlstrom, Eric; Oliveira, Rita G; Kepha, Stella; Porcella, Stephen F; Webster, Joanne; Anderson, Roy; Grigg, Michael E; Davis, Richard E; Wang, Jianbin; Nutman, Thomas B title: Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans date: 2020-11-06 journal: eLife DOI: 10.7554/elife.61562 sha: doc_id: 316033 cord_uid: xg8eb2nm file: cache/cord-352619-s2x53grh.json key: cord-352619-s2x53grh authors: Payne, Natalie; Kraberger, Simona; Fontenele, Rafaela S; Schmidlin, Kara; Bergeman, Melissa H; Cassaigne, Ivonne; Culver, Melanie; Varsani, Arvind; Van Doorslaer, Koenraad title: Novel Circoviruses Detected in Feces of Sonoran Felids date: 2020-09-15 journal: Viruses DOI: 10.3390/v12091027 sha: doc_id: 352619 cord_uid: s2x53grh file: cache/cord-022262-ck2lhojz.json key: cord-022262-ck2lhojz authors: Gromeier, Matthias; Wimmer, Eckard; Gorbalenya, Alexander E. title: Genetics, Pathogenesis and Evolution of Picornaviruses date: 2007-09-02 journal: Origin and Evolution of Viruses DOI: 10.1016/b978-012220360-2/50013-1 sha: doc_id: 22262 cord_uid: ck2lhojz Reading metadata file and updating bibliogrpahics === updating bibliographic database Building study carrel named keyword-genome-cord === file2bib.sh === id: cord-012473-p66of6kq author: Celniker, Susan E. title: Unlocking the secrets of the genome date: 2009-06-17 pages: extension: .txt txt: ./txt/cord-012473-p66of6kq.txt cache: ./cache/cord-012473-p66of6kq.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-012473-p66of6kq.txt' === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 39042 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === id: cord-268795-tjmx6msm author: Sardar, Rahila title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date: 2020-03-21 pages: extension: .txt txt: ./txt/cord-268795-tjmx6msm.txt cache: ./cache/cord-268795-tjmx6msm.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-268795-tjmx6msm.txt' === file2bib.sh === id: cord-001340-kqcx7lrq author: Ladner, Jason T. title: Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing date: 2014-06-17 pages: extension: .txt txt: ./txt/cord-001340-kqcx7lrq.txt cache: ./cache/cord-001340-kqcx7lrq.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-001340-kqcx7lrq.txt' === file2bib.sh === id: cord-007923-j3jpqd7k author: O'Brien, Stephen J. title: Cats date: 2004-12-14 pages: extension: .txt txt: ./txt/cord-007923-j3jpqd7k.txt cache: ./cache/cord-007923-j3jpqd7k.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-007923-j3jpqd7k.txt' === file2bib.sh === id: cord-265581-pbv8mjfc author: Tong, Yaojun title: An aurora of natural products-based drug discovery is coming date: 2020-06-06 pages: extension: .txt txt: ./txt/cord-265581-pbv8mjfc.txt cache: ./cache/cord-265581-pbv8mjfc.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-265581-pbv8mjfc.txt' === file2bib.sh === id: cord-301709-kvyes2lz author: Baker, Susan C. title: Developing Bioinformatic Resources for Coronaviruses date: 2006 pages: extension: .txt txt: ./txt/cord-301709-kvyes2lz.txt cache: ./cache/cord-301709-kvyes2lz.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-301709-kvyes2lz.txt' === file2bib.sh === id: cord-348059-wa1gjbck author: Gibbs, Richard A. title: The Human Genome Project changed everything date: 2020-08-07 pages: extension: .txt txt: ./txt/cord-348059-wa1gjbck.txt cache: ./cache/cord-348059-wa1gjbck.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-348059-wa1gjbck.txt' === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 39819 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === id: cord-310406-5pvln91x author: Asbury, Thomas M title: Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome date: 2010-09-02 pages: extension: .txt txt: ./txt/cord-310406-5pvln91x.txt cache: ./cache/cord-310406-5pvln91x.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-310406-5pvln91x.txt' === file2bib.sh === id: cord-320005-i30t7cvr author: Pardo, A. title: The Human Genome and Advances in Medicine: Limits and Future Prospects date: 2004-03-31 pages: extension: .txt txt: ./txt/cord-320005-i30t7cvr.txt cache: ./cache/cord-320005-i30t7cvr.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-320005-i30t7cvr.txt' === file2bib.sh === id: cord-000556-uu1oz2ei author: Kumar, Ranjit title: RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” date: 2012-01-20 pages: extension: .txt txt: ./txt/cord-000556-uu1oz2ei.txt cache: ./cache/cord-000556-uu1oz2ei.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-000556-uu1oz2ei.txt' === file2bib.sh === id: cord-297669-22fctxk4 author: Proudfoot, Chris title: Genome editing for disease resistance in pigs and chickens date: 2019-06-25 pages: extension: .txt txt: ./txt/cord-297669-22fctxk4.txt cache: ./cache/cord-297669-22fctxk4.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-297669-22fctxk4.txt' === file2bib.sh === id: cord-318392-r9bbomvk author: Woo, Patrick CY title: Coronavirus HKU15 in respiratory tract of pigs and first discovery of coronavirus quasispecies in 5′-untranslated region date: 2017-06-21 pages: extension: .txt txt: ./txt/cord-318392-r9bbomvk.txt cache: ./cache/cord-318392-r9bbomvk.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-318392-r9bbomvk.txt' === file2bib.sh === id: cord-005281-wy0zk9p8 author: Blinov, V. M. title: Viral component of the human genome date: 2017-05-09 pages: extension: .txt txt: ./txt/cord-005281-wy0zk9p8.txt cache: ./cache/cord-005281-wy0zk9p8.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-005281-wy0zk9p8.txt' === file2bib.sh === id: cord-267714-ji88tvsl author: JAKUPCIAK, JOHN P. title: Biological agent detection technologies date: 2009-04-21 pages: extension: .txt txt: ./txt/cord-267714-ji88tvsl.txt cache: ./cache/cord-267714-ji88tvsl.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-267714-ji88tvsl.txt' === file2bib.sh === id: cord-324811-yjwavea5 author: Kidgell, Claire title: Elucidating genetic diversity with oligonucleotide arrays date: 2005 pages: extension: .txt txt: ./txt/cord-324811-yjwavea5.txt cache: ./cache/cord-324811-yjwavea5.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-324811-yjwavea5.txt' === file2bib.sh === id: cord-265857-fs6dj3dp author: Liu, Yu-Tsueng title: Infectious Disease Genomics date: 2010-12-24 pages: extension: .txt txt: ./txt/cord-265857-fs6dj3dp.txt cache: ./cache/cord-265857-fs6dj3dp.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-265857-fs6dj3dp.txt' === file2bib.sh === id: cord-346335-el45v0a5 author: Tan, H.S. title: Fourier spectral density of the coronavirus genome date: 2020-08-11 pages: extension: .txt txt: ./txt/cord-346335-el45v0a5.txt cache: ./cache/cord-346335-el45v0a5.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-346335-el45v0a5.txt' === file2bib.sh === id: cord-277687-u3q36o3e author: Shean, Ryan C. title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank date: 2019-01-23 pages: extension: .txt txt: ./txt/cord-277687-u3q36o3e.txt cache: ./cache/cord-277687-u3q36o3e.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-277687-u3q36o3e.txt' === file2bib.sh === id: cord-014461-2ubh9u8r author: Nelson, Oranmiyan W. title: Genome sequences published outside of Standards in Genomic Sciences, July - October 2012 date: 2012-10-10 pages: extension: .txt txt: ./txt/cord-014461-2ubh9u8r.txt cache: ./cache/cord-014461-2ubh9u8r.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-014461-2ubh9u8r.txt' === file2bib.sh === id: cord-298136-mel9fxw8 author: O'Malley, Maureen A. title: Whole-genome patenting date: 2005-05-10 pages: extension: .txt txt: ./txt/cord-298136-mel9fxw8.txt cache: ./cache/cord-298136-mel9fxw8.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-298136-mel9fxw8.txt' === file2bib.sh === id: cord-352619-s2x53grh author: Payne, Natalie title: Novel Circoviruses Detected in Feces of Sonoran Felids date: 2020-09-15 pages: extension: .txt txt: ./txt/cord-352619-s2x53grh.txt cache: ./cache/cord-352619-s2x53grh.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-352619-s2x53grh.txt' === file2bib.sh === id: cord-000902-ew8orn0z author: Zhao, Xiangyan title: Coevolution between simple sequence repeats (SSRs) and virus genome size date: 2012-08-30 pages: extension: .txt txt: ./txt/cord-000902-ew8orn0z.txt cache: ./cache/cord-000902-ew8orn0z.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-000902-ew8orn0z.txt' === file2bib.sh === id: cord-265329-bsypo08l author: van Dorp, Lucy title: Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 date: 2020-05-05 pages: extension: .txt txt: ./txt/cord-265329-bsypo08l.txt cache: ./cache/cord-265329-bsypo08l.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-265329-bsypo08l.txt' === file2bib.sh === id: cord-330312-1pjolkql author: Liu, Y.-T. title: Infectious Disease Genomics date: 2017-01-20 pages: extension: .txt txt: ./txt/cord-330312-1pjolkql.txt cache: ./cache/cord-330312-1pjolkql.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-330312-1pjolkql.txt' === file2bib.sh === id: cord-004123-1s8kuno2 author: Jaiswal, Arun Kumar title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis date: 2020-01-10 pages: extension: .txt txt: ./txt/cord-004123-1s8kuno2.txt cache: ./cache/cord-004123-1s8kuno2.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-004123-1s8kuno2.txt' === file2bib.sh === id: cord-348515-bqqyly23 author: Zhao, Suhui title: Re-emergent Human Adenovirus Genome Type 7d Caused an Acute Respiratory Disease Outbreak in Southern China After a Twenty-one Year Absence date: 2014-12-08 pages: extension: .txt txt: ./txt/cord-348515-bqqyly23.txt cache: ./cache/cord-348515-bqqyly23.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-348515-bqqyly23.txt' === file2bib.sh === id: cord-000012-p56v8wi1 author: Bigot, Yves title: Molecular evidence for the evolution of ichnoviruses from ascoviruses by symbiogenesis date: 2008-09-18 pages: extension: .txt txt: ./txt/cord-000012-p56v8wi1.txt cache: ./cache/cord-000012-p56v8wi1.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-000012-p56v8wi1.txt' === file2bib.sh === id: cord-275683-1qj9ri18 author: Roux, Simon title: Metagenomics in Virology date: 2019-06-12 pages: extension: .txt txt: ./txt/cord-275683-1qj9ri18.txt cache: ./cache/cord-275683-1qj9ri18.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-275683-1qj9ri18.txt' === file2bib.sh === id: cord-281959-g4sjyytr author: Phillippy, Adam M title: Efficient oligonucleotide probe selection for pan-genomic tiling arrays date: 2009-09-16 pages: extension: .txt txt: ./txt/cord-281959-g4sjyytr.txt cache: ./cache/cord-281959-g4sjyytr.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-281959-g4sjyytr.txt' === file2bib.sh === id: cord-304607-td0776wj author: Paszkiewicz, Konrad H. title: Omics, Bioinformatics, and Infectious Disease Research date: 2010-12-24 pages: extension: .txt txt: ./txt/cord-304607-td0776wj.txt cache: ./cache/cord-304607-td0776wj.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-304607-td0776wj.txt' === file2bib.sh === id: cord-018804-wj35q88f author: Lázaro, Ester title: Genetic Variability in RNA Viruses: Consequences in Epidemiology and in the Development of New Stratgies for the Extinction of Infectivity date: 2007 pages: extension: .txt txt: ./txt/cord-018804-wj35q88f.txt cache: ./cache/cord-018804-wj35q88f.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-018804-wj35q88f.txt' === file2bib.sh === id: cord-003316-r5te5xob author: Balloux, Francois title: From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic date: 2018-12-17 pages: extension: .txt txt: ./txt/cord-003316-r5te5xob.txt cache: ./cache/cord-003316-r5te5xob.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-003316-r5te5xob.txt' === file2bib.sh === id: cord-016588-f8uvhstb author: Sintchenko, Vitali title: Informatics for Infectious Disease Research and Control date: 2009-10-03 pages: extension: .txt txt: ./txt/cord-016588-f8uvhstb.txt cache: ./cache/cord-016588-f8uvhstb.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-016588-f8uvhstb.txt' === file2bib.sh === id: cord-018437-yjvwa1ot author: Mitchell, Michael title: Taxonomy date: 2013-08-26 pages: extension: .txt txt: ./txt/cord-018437-yjvwa1ot.txt cache: ./cache/cord-018437-yjvwa1ot.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-018437-yjvwa1ot.txt' === file2bib.sh === id: cord-264746-gfn312aa author: Muse, Spencer title: GENOMICS AND BIOINFORMATICS date: 2012-03-29 pages: extension: .txt txt: ./txt/cord-264746-gfn312aa.txt cache: ./cache/cord-264746-gfn312aa.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-264746-gfn312aa.txt' === file2bib.sh === id: cord-017932-vmtjc8ct author: Georgiev, Vassil St. title: Genomic and Postgenomic Research date: 2009 pages: extension: .txt txt: ./txt/cord-017932-vmtjc8ct.txt cache: ./cache/cord-017932-vmtjc8ct.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-017932-vmtjc8ct.txt' === file2bib.sh === id: cord-016798-tv2ntug6 author: Gautam, Ablesh title: Bioinformatics Applications in Advancing Animal Virus Research date: 2019-06-06 pages: extension: .txt txt: ./txt/cord-016798-tv2ntug6.txt cache: ./cache/cord-016798-tv2ntug6.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-016798-tv2ntug6.txt' === file2bib.sh === id: cord-304498-ty41xob0 author: Denison, Mark R title: Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity date: 2011-03-01 pages: extension: .txt txt: ./txt/cord-304498-ty41xob0.txt cache: ./cache/cord-304498-ty41xob0.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-304498-ty41xob0.txt' === file2bib.sh === id: cord-334394-qgyzk7th author: Edgar, Robert C. title: Petabase-scale sequence alignment catalyses viral discovery date: 2020-08-10 pages: extension: .txt txt: ./txt/cord-334394-qgyzk7th.txt cache: ./cache/cord-334394-qgyzk7th.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-334394-qgyzk7th.txt' === file2bib.sh === id: cord-015850-ef6svn8f author: Saitou, Naruya title: Eukaryote Genomes date: 2013-08-22 pages: extension: .txt txt: ./txt/cord-015850-ef6svn8f.txt cache: ./cache/cord-015850-ef6svn8f.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-015850-ef6svn8f.txt' === file2bib.sh === id: cord-269124-oreg7rnj author: Spyrou, Maria A. title: Ancient pathogen genomics as an emerging tool for infectious disease research date: 2019-04-05 pages: extension: .txt txt: ./txt/cord-269124-oreg7rnj.txt cache: ./cache/cord-269124-oreg7rnj.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-269124-oreg7rnj.txt' === file2bib.sh === id: cord-350747-5t5xthk6 author: Gmyl, A. P. title: Diverse Mechanisms of RNA Recombination date: 2005 pages: extension: .txt txt: ./txt/cord-350747-5t5xthk6.txt cache: ./cache/cord-350747-5t5xthk6.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-350747-5t5xthk6.txt' === file2bib.sh === id: cord-340423-f8ab7413 author: Barr, J.N. title: Genetic Instability of RNA Viruses date: 2016-09-09 pages: extension: .txt txt: ./txt/cord-340423-f8ab7413.txt cache: ./cache/cord-340423-f8ab7413.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-340423-f8ab7413.txt' === file2bib.sh === id: cord-316033-xg8eb2nm author: Easton, Alice title: Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans date: 2020-11-06 pages: extension: .txt txt: ./txt/cord-316033-xg8eb2nm.txt cache: ./cache/cord-316033-xg8eb2nm.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-316033-xg8eb2nm.txt' === file2bib.sh === id: cord-314594-xvc8hvpq author: Singh, Roshan Kumar title: Breeding and biotechnological interventions for trait improvement: status and prospects date: 2020-09-18 pages: extension: .txt txt: ./txt/cord-314594-xvc8hvpq.txt cache: ./cache/cord-314594-xvc8hvpq.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-314594-xvc8hvpq.txt' === file2bib.sh === id: cord-302047-vv5gpldi author: Willemsen, Anouk title: On the stability of sequences inserted into viral genomes date: 2019-11-14 pages: extension: .txt txt: ./txt/cord-302047-vv5gpldi.txt cache: ./cache/cord-302047-vv5gpldi.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-302047-vv5gpldi.txt' === file2bib.sh === id: cord-022262-ck2lhojz author: Gromeier, Matthias title: Genetics, Pathogenesis and Evolution of Picornaviruses date: 2007-09-02 pages: extension: .txt txt: ./txt/cord-022262-ck2lhojz.txt cache: ./cache/cord-022262-ck2lhojz.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-022262-ck2lhojz.txt' Que is empty; done keyword-genome-cord === reduce.pl bib === id = cord-000556-uu1oz2ei author = Kumar, Ranjit title = RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” date = 2012-01-20 pages = extension = .txt mime = text/plain words = 4407 sentences = 235 flesch = 46 summary = Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. Therefore, genome structural annotation or the identification and demarcation of boundaries of functional elements in a genome (e.g., genes, non-coding RNAs, proteins, and regulatory elements) are critical elements in infectious disease systems biology. Whole genome transcriptome studies (such as whole genome tiling arrays [13, 14, 15] and high throughput sequencing [16, 17] ) are complementary experimental approaches for bacterial genome annotation and can identify ''novel'' genes, gene boundaries, regulatory regions, intergenic regions, and operon structures. We compared the RNA-Seq based transcriptome map with the available genome annotation to identify expressed, novel, and intergenic regions in the genome. The single nucleotide resolution map helped uncover the structure and complexity of this pathogen's transcriptome and led to the identification of novel, small RNAs and protein coding genes as well as gene co-expression. cache = ./cache/cord-000556-uu1oz2ei.txt txt = ./txt/cord-000556-uu1oz2ei.txt === reduce.pl bib === id = cord-000012-p56v8wi1 author = Bigot, Yves title = Molecular evidence for the evolution of ichnoviruses from ascoviruses by symbiogenesis date = 2008-09-18 pages = extension = .txt mime = text/plain words = 6419 sentences = 293 flesch = 44 summary = CONCLUSION: Our results provide molecular evidence supporting the origin of ichnoviruses from ascoviruses by lateral transfer of ascoviral genes into ichneumonid wasp genomes, perhaps the first example of symbiogenesis between large DNA viruses and eukaryotic organisms. With respect to both species number and mechanisms that lead to successful parasitism, endoparasitic wasps are known to inject secretions at oviposition, but only a few lineages use viruses or virus-like particles (VLPs) to evade or to suppress host defences. Extending our investigations to proteins encoded by open reading frames of certain ascoviruses and bracoviruses, hosts and bacteria, in the light of recent analyses about the involvement of the replication machinery of virus groups related to ascoviruses in lateral gene transfer [29] , we discuss the robustness and the limits of the molecular evidence supporting an ascovirus origin for ichnovirus lineages. cache = ./cache/cord-000012-p56v8wi1.txt txt = ./txt/cord-000012-p56v8wi1.txt === reduce.pl bib === id = cord-004123-1s8kuno2 author = Jaiswal, Arun Kumar title = The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis date = 2020-01-10 pages = extension = .txt mime = text/plain words = 5934 sentences = 363 flesch = 51 summary = title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis pallidum strains isolated from different parts of the world and a diverse range of hosts were comparatively analysed using pan-genomic strategy. pertenue, we found differences in the presence/absence of pathogenicity islands (PAIs) and genomic islands (GIs) on subsp.-based study. In this work, we perform a pan-genome approach to better understand the differences of Treponema pallidum infections in the broad spectrum and how genome plasticity is related to the symptom patterns. Finally, we provide insights into the specific subsets (singletons and the panand core genomes) of 53 genomes of T pallidum strains and correlate these subsets with the plasticity of pathogenicity islands and virulence genes. The subspecies responsible for non-venereal syphilis is Treponema pallidum subsp. Genes which are present in pallidum subspecies pathogenicity islands (PAIs) or genomic islands (GIs) are absent in the subspecies endemicum and pertenue. cache = ./cache/cord-004123-1s8kuno2.txt txt = ./txt/cord-004123-1s8kuno2.txt === reduce.pl bib === id = cord-012473-p66of6kq author = Celniker, Susan E. title = Unlocking the secrets of the genome date = 2009-06-17 pages = extension = .txt mime = text/plain words = 2556 sentences = 119 flesch = 37 summary = T he primary objective of the Human Genome Project was to produce highquality sequences not just for the human genome but also for those of the chief model organisms: Escherichia coli, yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogaster) and mouse (Mus musculus). Free access to the resultant data has prompted much biological research, including development of a map of common human genetic variants (the International HapMap Project) 1 , expression profiling of healthy and diseased cells 2 and in-depth studies of many individual genes. On the basis of this experience, the NHGRI launched two complementary programmes in 2007: an expansion of the human ENCODE project to the whole genome (www.genome.gov/ENCODE) and the model organism ENCODE (modENCODE) project to generate a comprehensive annotation of the functional elements in the C. The research communities that study these two organisms will rapidly make use of the modENCODE results, deploying powerful experimental approaches that are often not possible or practical in mammals, including genetic, genomic, transgenic, biochemical and RNAi assays. cache = ./cache/cord-012473-p66of6kq.txt txt = ./txt/cord-012473-p66of6kq.txt === reduce.pl bib === id = cord-018804-wj35q88f author = Lázaro, Ester title = Genetic Variability in RNA Viruses: Consequences in Epidemiology and in the Development of New Stratgies for the Extinction of Infectivity date = 2007 pages = extension = .txt mime = text/plain words = 8510 sentences = 398 flesch = 44 summary = High error prone replication, together with the short replication times and large population sizes typical of RNA viruses, instead of being a handicap for survival provides an extraordinary evolutionary advantage by permitting the generation of a wide reservoir of mutants with different phenotypic properties [7] . However, the fact that DNA organisms, which usually live in constant environments, have evolved corrector activities, whereas RNA viruses have not, suggests that replication with high error rates is a selected character that strongly favours viral adaptation to fast changing conditions. Quasi-species replicating during a long time in a near-constant environment in the absence of large population size fluctuations can present a low rate of fixation of mutations in the consensus sequence, despite the continuous occurrence of mutants that is characteristic of the underlying dynamics of the population. The infection of a new host constitutes a sudden change in the environment in which viral replication takes place, usually with the consequence of a drastic decrease in the average fitness of the virus population, which prevents further transmission. cache = ./cache/cord-018804-wj35q88f.txt txt = ./txt/cord-018804-wj35q88f.txt === reduce.pl bib === id = cord-018437-yjvwa1ot author = Mitchell, Michael title = Taxonomy date = 2013-08-26 pages = extension = .txt mime = text/plain words = 9283 sentences = 561 flesch = 48 summary = Classifi cation is based on the genomic nucleic acid used by the virus (DNA or RNA), strandedness (single or double stranded), and method of replication. The nucleocapsids of some viruses are surrounded by envelopes composed of lipid bilayers and host-or viral-encoded proteins. The sequence of negative-sense ssRNA is complementary to the coding sequence for translation, so mRNA must be synthesized by RNA polymerase, typically carried within the virion, before translation into viral proteins. Among the families of viruses able to infect humans and other vertebrate hosts, there are many species that target and cause disease in the lung. The nucleocapsid is surrounded by an envelope derived from host-cell membrane and viral envelope proteins, including hepatitis B surface antigen. The genome of human parainfl uenza viruses is ~15 kb in length with an organization and six reading frames (N, P, M, F, HN, L) typical of the Paramyxoviridae (Karron and Collins 2007 ) . cache = ./cache/cord-018437-yjvwa1ot.txt txt = ./txt/cord-018437-yjvwa1ot.txt === reduce.pl bib === id = cord-264746-gfn312aa author = Muse, Spencer title = GENOMICS AND BIOINFORMATICS date = 2012-03-29 pages = extension = .txt mime = text/plain words = 10976 sentences = 583 flesch = 58 summary = The success of this project (it came in almost 3 years ahead of time and 10% under budget, while at the same time providing more data than originally planned) depended on innovations in a variety of areas: breakthroughs in basic molecular biology to allow manipulation of DNA and other compounds; improved engineering and manufacturing technology to produce equipment for reading the sequences of DNA; advances in robotics and laboratory automation; development of statistical methods to interpret data from sequencing projects; and the creation of specialized computing hardware and software systems to circumvent massive computational barriers that faced genome scientists. Although the list of important biotechnologies changes on an almost daily basis, there are three prominent data types in today's environment: (1) genome sequences provide the starting point that allows scientists to begin understanding the genetic underpinnings of an organism; (2) measurements of gene expression levels facilitate studies of gene regulation, which, among other things, help us to understand how an organism's genome interacts with its environment; and (3) genetic polymorphisms are variations from individual to individual within species, and understanding how these variations correlate with phenotypes such as disease susceptibility is a crucial element of modern biomedical research. cache = ./cache/cord-264746-gfn312aa.txt txt = ./txt/cord-264746-gfn312aa.txt === reduce.pl bib === id = cord-267714-ji88tvsl author = JAKUPCIAK, JOHN P. title = Biological agent detection technologies date = 2009-04-21 pages = extension = .txt mime = text/plain words = 3526 sentences = 175 flesch = 35 summary = PCR-based methods have critical limitations, since they depend on a priori knowledge of what sequence to detect in a sample further complicated by recent demonstrations of greater variability in genomic sequence than expected. A platform for genome identification of a specimen from any source must not only be sensitive and specific, but must also detect a variety of pathogens with high accuracy, including modified or previously uncharacterized agents, and this challenge is daunting when identification must be achieved using nucleic acids in a complex sample matrix. The build-out of genome identification DNA sequencing technology in the form of practical instrumentation will be achieved by incorporating the critical requirements for accurate long reads, without dependency for template amplification, capable of manipulating terabytes of data to provide reliable and useful identification of genetic sequences within any unknown sample, whether clinical, environmental, or other type of specimen. cache = ./cache/cord-267714-ji88tvsl.txt txt = ./txt/cord-267714-ji88tvsl.txt === reduce.pl bib === id = cord-016588-f8uvhstb author = Sintchenko, Vitali title = Informatics for Infectious Disease Research and Control date = 2009-10-03 pages = extension = .txt mime = text/plain words = 8186 sentences = 393 flesch = 36 summary = The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. "New Age" infectious disease informatics rests on advances in microbial genomics, the sequencing and comparative study of the genomes of pathogens, and proteomics or the identification and characterization of their protein related properties and reconstruction of metabolic and regulatory pathways (Bansal 2005) . The figure was produced using Artemis software (The Wellcome Trust Sanger Institute, UK) 1 Informatics for Infectious Disease Research and Control evidence-based gene calling or translating alignments of the DNA sequence to known proteins; and (3) aligning cDNAs from the same or related species. cache = ./cache/cord-016588-f8uvhstb.txt txt = ./txt/cord-016588-f8uvhstb.txt === reduce.pl bib === id = cord-014461-2ubh9u8r author = Nelson, Oranmiyan W. title = Genome sequences published outside of Standards in Genomic Sciences, July - October 2012 date = 2012-10-10 pages = extension = .txt mime = text/plain words = 4124 sentences = 454 flesch = 44 summary = Complete Genome Sequence of Brucella abortus A13334, a New Strain Isolated from the Fetal Gastric Fluid of Dairy Cattle Complete Genome Sequence of Brucella canis Strain HSK A52141, Isolated from the Blood of an Infected Dog Complete Genome Sequence of Streptococcus salivarius PS4, a Strain Isolated from Human Milk Complete Genome Sequences of Probiotic Strains Bifidobacterium animalis subsp. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 1/06-A, Isolated from a Horse in North America Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Characterization and Complete Genome Sequence of Human Coronavirus NL63 Isolated in China Complete Genome Sequence of a Novel Pararetrovirus Isolated from Soybean Complete Genome Sequence of a Polyomavirus Isolated from Horses Complete Genome Sequence of a Novel Porcine Sapelovirus Strain YC2011 Isolated from Piglets with Diarrhea Draft Genome Sequence of Aspergillus oryzae Strain 3.042 cache = ./cache/cord-014461-2ubh9u8r.txt txt = ./txt/cord-014461-2ubh9u8r.txt === reduce.pl bib === id = cord-265329-bsypo08l author = van Dorp, Lucy title = Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 date = 2020-05-05 pages = extension = .txt mime = text/plain words = 4915 sentences = 270 flesch = 49 summary = Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. The extraordinary availability of genomic data during the COVID-19 pandemic has been made possible thanks to a tremendous effort by hundreds of researchers globally depositing SARS-CoV-2 assemblies (Table S1 ) and the proliferation of close to real time data visualisation and analysis tools including NextStrain (https://nextstrain.org) and CoV-GLUE (http://cov-glue.cvr.gla.ac.uk). In this work we use this data to analyse the genomic diversity that has emerged in the global population of SARS-CoV-2 since the beginning of the COVID-19 pandemic, based on a download of 7710 assemblies. The genomic diversity of the global SARS-CoV-2 population being recapitulated in multiple countries points to extensive worldwide transmission of COVID-19, likely from extremely early on in the pandemic. cache = ./cache/cord-265329-bsypo08l.txt txt = ./txt/cord-265329-bsypo08l.txt === reduce.pl bib === id = cord-269124-oreg7rnj author = Spyrou, Maria A. title = Ancient pathogen genomics as an emerging tool for infectious disease research date = 2019-04-05 pages = extension = .txt mime = text/plain words = 11932 sentences = 518 flesch = 42 summary = Examples of tools that have shown their effectiveness with ancient metagenomic DNA include the widely used Basic Local Alignment Search Tool (BLAST) 68 ; the MEGAN Alignment Tool (MALT) 41 , which involves a taxonomic binning algorithm that can use whole genome databases (such as the National Center for Biotechnical Information (NCBI) Reference Sequence (RefSeq) database 69 ); Metagenomic Phylogenetic Analysis (MetaPhlAn) 70 , which is also integrated into the metagenomic pipeline MetaBIT 71 and uses thousands (or millions) of marker genes for the distinction of specific microbial clades; or Kraken 72 , an alignment free sequence classifier that is based on k-mer matching of a query to a constructed database. Similar limitations can arise when the evolutionary history of a microorganism is vastly affected by recombination, as observed for HBV 44, 53 , although HBV molecular dating was recently attempted using a different genomic data set and suggested that the currently explored diversity of Old and New World pri mate lineages (including all human genotypes) may have emerged within the last 20,000 years 43 . cache = ./cache/cord-269124-oreg7rnj.txt txt = ./txt/cord-269124-oreg7rnj.txt === reduce.pl bib === id = cord-265581-pbv8mjfc author = Tong, Yaojun title = An aurora of natural products-based drug discovery is coming date = 2020-06-06 pages = extension = .txt mime = text/plain words = 3077 sentences = 155 flesch = 46 summary = With recent scientific advances combining metabolic sciences and technology, multi-omics, big data, combinatorial biosynthesis, synthetic biology, genome editing technology (such as CRISPR), artificial intelligence (AI), and 3D printing, the "high-hanging fruit" is becoming more and more accessible with reduced costs. The incredible rate of development in genome sequencing, modern metabolic engineering, synthetic biology, advanced genome editing, big data, artificial intelligence (AI), and 3D printing together with the growing microbial strain collections enable us to access the previously inaccessible natural products. It starts with genome mining (the analysis of high quality whole genome information), which requires bioinformatics, big data, and even AI; to pathway cloning (refactoring), expression and fermentation, which needs design-buildtest-learn (DBTL) cycle-based metabolic engineering; to the target natural product identification, which requires modern chemical analysis; and to later compound modification and clinical studies, which needs biochemistry and cell biology. cache = ./cache/cord-265581-pbv8mjfc.txt txt = ./txt/cord-265581-pbv8mjfc.txt === reduce.pl bib === id = cord-265857-fs6dj3dp author = Liu, Yu-Tsueng title = Infectious Disease Genomics date = 2010-12-24 pages = extension = .txt mime = text/plain words = 4341 sentences = 233 flesch = 45 summary = The completed or ongoing genome projects will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002 (Gardner et al., 2002; Holt et al., 2002) . Genome sequencing projects for other important human disease vectors are in progress Megy et al., 2009 ). One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. The completed or ongoing genome projects (Table 10 .1) will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. cache = ./cache/cord-265857-fs6dj3dp.txt txt = ./txt/cord-265857-fs6dj3dp.txt === reduce.pl bib === id = cord-000902-ew8orn0z author = Zhao, Xiangyan title = Coevolution between simple sequence repeats (SSRs) and virus genome size date = 2012-08-30 pages = extension = .txt mime = text/plain words = 5822 sentences = 302 flesch = 53 summary = The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. While, relative abundance and relative density were examined to make the SSRs comparison parallel among differently sized species genomes; principal component analysis (PCA) was designed to investigate which repeat class(es) made a greater contribution to the variance among virus species as well as the relationships between repeat classes. Therefore, the 257 genome sequences were selected as samples for the analysis of relationship between SSRs distribution and genome size in the level of the whole virus. We surveyed the distribution of different SSR classes in virus genomes to investigate the relationship between repeat classes (mono-, di-, tri-, tetra-, penta-and hexa-) and genome sequence length. Coevolution between simple sequence repeats (SSRs) and virus genome size cache = ./cache/cord-000902-ew8orn0z.txt txt = ./txt/cord-000902-ew8orn0z.txt === reduce.pl bib === id = cord-001340-kqcx7lrq author = Ladner, Jason T. title = Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing date = 2014-06-17 pages = extension = .txt mime = text/plain words = 2512 sentences = 121 flesch = 40 summary = Genome sequences play a critical role in our understanding of viral evolution, disease epidemiology, surveillance, diagnosis, and countermeasure development and thus represent valuable resources which must be properly documented and curated to ensure future utility. Here, we outline a set of viral genome quality standards, similar in concept to those proposed for large DNA genomes (4) but focused on the particular challenges of and needs for research on small RNA/ DNA viruses, including characterization of the genomic diversity inherent in all viral samples/populations. Therefore, we have used technology-agnostic criteria to define five standard categories designed to encompass the levels of completeness most often encountered in viral sequencing projects. There is a trend toward requiring a complete genome sequence when a description of a novel virus is being published, and we agree that this is a good goal; however, the amount of time and resources required to complete the last 1 to 2% of a viral genome is often cost and time prohibitive for projects sequencing a large number of samples, and in most cases the very ends of the segments are not essential for proper identification and characterization. cache = ./cache/cord-001340-kqcx7lrq.txt txt = ./txt/cord-001340-kqcx7lrq.txt === reduce.pl bib === id = cord-007923-j3jpqd7k author = O'Brien, Stephen J. title = Cats date = 2004-12-14 pages = extension = .txt mime = text/plain words = 1212 sentences = 59 flesch = 44 summary = Wild cats dominate their habitat but require vast expanses to survive, which explains the tragic depredation such that every species of Felidae, except the domestic cat, is considered either endangered or threatened in the wild today by CITES, IUCN Red Book and other monitors of the world's most endangered species. Domestic cats and dogs enjoy more medical scrutiny than any species except humans. The cat offers the promise of a second carnivore species (in addition to the dog, which shares a common ancestor with cats dating back to approximately 60 million years ago) to improve human genome annotation, as well as to complement the biomedical and genomic discoveries that make the feline genome attractive. The conserved genome of the cat is retained in the other 36 Felidae species, as well as most of the 246 species of the Carnivora order, the only reshuffled exceptions occuring in the dog and bear families. cache = ./cache/cord-007923-j3jpqd7k.txt txt = ./txt/cord-007923-j3jpqd7k.txt === reduce.pl bib === id = cord-324811-yjwavea5 author = Kidgell, Claire title = Elucidating genetic diversity with oligonucleotide arrays date = 2005 pages = extension = .txt mime = text/plain words = 5561 sentences = 268 flesch = 37 summary = Oligonucleotide microarrays, predominantly high-density oligonucleotide arrays, have emerged as the principal platforms for performing genome-wide diversity analysis. Since a number of complex issues still remain with high-throughput microarray-based SNP genotyping in humans, in the remainder of this review, we will discuss the application of high-density oligonucleotide arrays to elucidate genetic diversity, with particular focus on studies undertaken with Saccharomyces cerevisiae (Winzeler et al. falciparum (Clark 2002) , the genome-wide analysis facilitated by hybridization of genomic DNA to the A¡ymetrix microarray identi¢ed signi¢cant di¡erences in potential selection pressure across di¡erent gene families and locations within the chromosome (Volkman et al. Although SNPs and deletions can be readily identi¢ed using A¡ymetrix high-density arrays, more complex types of genetic diversity may also be determined using this platform. cache = ./cache/cord-324811-yjwavea5.txt txt = ./txt/cord-324811-yjwavea5.txt === reduce.pl bib === id = cord-005281-wy0zk9p8 author = Blinov, V. M. title = Viral component of the human genome date = 2017-05-09 pages = extension = .txt mime = text/plain words = 6583 sentences = 306 flesch = 44 summary = In the human genome, this capacity is determined by the portion of chromosomal DNA, which does not contain species-specific protein-encoding sequences and, thus, can basically make a place for novel information that will be modified to reach a new balance. In fact, the scope of the described phenomena is not limited to retroviruses as such, since the ubiquity of retroviral elements in animal genomes, their activity in germline cells [31] , along with the fact that viral replication depends significantly on RNA expression, allow retroviruses to contribute in different ways to the insertion of nonretroviral genes into animal germline cells. Finally, the ability to incorporate parts of the viral genome into the chromosomal DNA of host germline cells can vary strongly among different taxonomic groups of viruses, i.e., orders, families, genera, and even species If insertions of viral sequences remain functionally active in the host cell genome, they can give rise to either proteins that function in a new environment or untranslated RNAs of different sizes. cache = ./cache/cord-005281-wy0zk9p8.txt txt = ./txt/cord-005281-wy0zk9p8.txt === reduce.pl bib === id = cord-017932-vmtjc8ct author = Georgiev, Vassil St. title = Genomic and Postgenomic Research date = 2009 pages = extension = .txt mime = text/plain words = 8476 sentences = 360 flesch = 36 summary = The family Enterobacteriaceae encompasses a diverse group of bacteria including many of the most important human pathogens (Salmonella, Yersinia, Klebsiella, Shigella), as well as one of the most enduring laboratory research organisms, the nonpathogenic Escherichia coli K12. To this end, NIAID has made significant investments in large-scale sequencing projects, including projects to sequence the complete genomes of many pathogens, such as the bacteria that cause tuberculosis, gonorrhea, chlamydia, and cholera, as well as organisms that are considered agents of bioterrorism. The availability of microbial and human DNA sequences opens up new opportunities and allows scientists to perform functional analyses of genes and proteins in whole genomes and cells, as well as the host's immune response and an individual's genetic susceptibility to pathogens. The PFGRC was established in 2001 to provide and distribute to the broader research community a wide range of genomic resources, reagents, data, and technologies for the functional analysis of microbial pathogens and invertebrate vectors of infectious diseases. cache = ./cache/cord-017932-vmtjc8ct.txt txt = ./txt/cord-017932-vmtjc8ct.txt === reduce.pl bib === id = cord-348059-wa1gjbck author = Gibbs, Richard A. title = The Human Genome Project changed everything date = 2020-08-07 pages = extension = .txt mime = text/plain words = 1732 sentences = 91 flesch = 54 summary = Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. He developed basic methods for DNA and mutation ana lysis and was an early contributor to the Human Genome Project (HGP), leading one of five sites that generated the majority of the sequence. The power of advances in genomics and computers was revealed in the spectacular series of post-HGP projects that were of comparable scale. Some still tally the success of the HGP from lists of new drugs or therapies and argue that world-changing examples in biology, such as the spectacular advances of gene editing tools or the expansion of cancer therapeutics through targeted immunotherapy, are largely based on microbial, cellular and animal studies rather than genomics. cache = ./cache/cord-348059-wa1gjbck.txt txt = ./txt/cord-348059-wa1gjbck.txt === reduce.pl bib === id = cord-304607-td0776wj author = Paszkiewicz, Konrad H. title = Omics, Bioinformatics, and Infectious Disease Research date = 2010-12-24 pages = extension = .txt mime = text/plain words = 7022 sentences = 367 flesch = 46 summary = This chapter discusses the current state of play of bioinformatics related to genomics and transcriptomics, briefs metagenomics that finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms. Bioinformatics plays a key role at several steps in genomics, comparative genomics, and functional genomics: sequence alignment, assembly, identification of single nucleotide polymorphisms (SNP), gene prediction, quantitative analysis of transcription data, etc. The term "metagenomics" was originally used to describe the sequencing of genomes of uncultured microorganisms in order to explore their abilities to produce natural products (Handelsman et al., 1998 , Rondon et al., 2000 and subsequently resulted in novel insights into the ecology and evolution of microorganisms on a scale not imagined possible before (see Cardenas and Tiedje, 2008; Hugenholtz and Tyson, 2008 for an overview). However, metagenomics now finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms from, for example, patient material that could lead to the identification of the cause of disease. cache = ./cache/cord-304607-td0776wj.txt txt = ./txt/cord-304607-td0776wj.txt === reduce.pl bib === id = cord-281959-g4sjyytr author = Phillippy, Adam M title = Efficient oligonucleotide probe selection for pan-genomic tiling arrays date = 2009-09-16 pages = extension = .txt mime = text/plain words = 7392 sentences = 360 flesch = 54 summary = The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. In order to both characterize new strains based on genetic content, and detect polymorphism at a higher resolution in small RNAs (sRNAs) and intergenic sequences, the array was required to cover all pan-genomic sequences with a high density of probes. To see the similarities between the Pan-Tiling and Minimum Hitting Set problems, let the sequence G be a concatenation of all the genomes from a species, and let W = {w 1 , w 2 ,..., w m } be the set of m intervals that results from segmenting G into non-overlapping, end-to-end, length l windows. cache = ./cache/cord-281959-g4sjyytr.txt txt = ./txt/cord-281959-g4sjyytr.txt === reduce.pl bib === id = cord-016798-tv2ntug6 author = Gautam, Ablesh title = Bioinformatics Applications in Advancing Animal Virus Research date = 2019-06-06 pages = extension = .txt mime = text/plain words = 6978 sentences = 405 flesch = 44 summary = The chapter further provides information on the tools that can be used to study viral epidemiology, phylogenetic analysis, structural modelling of proteins, epitope recognition and open reading frame (ORF) recognition and tools that enable to analyse host-viral interactions, gene prediction in the viral genome, etc. This chapter will introduce virologists to some of the common as well virus-specific bioinformatics tools that the researches can use to analyse viral sequence data to elucidate the viral dynamics, evolution and preventive therapeutics. Novel virus types comprise of new CDSs that are different than previously known CDSs. There are multiple databases and tools available for analysis of human viruses; however, there are still only a limited number of resources designed specifically for veterinary viruses. VIRsiRNAdb is an online curated repository that stores experimentally validated research data of siRNA and short hairpin RNA (shRNA) targeting diverse genes of 42 important human viruses, including influenza virus (Tyagi et al. cache = ./cache/cord-016798-tv2ntug6.txt txt = ./txt/cord-016798-tv2ntug6.txt === reduce.pl bib === === reduce.pl bib === id = cord-301709-kvyes2lz author = Baker, Susan C. title = Developing Bioinformatic Resources for Coronaviruses date = 2006 pages = extension = .txt mime = text/plain words = 1017 sentences = 58 flesch = 44 summary = The database will contain high-quality curated data: sequence annotations from published whole and partial genomes; relevant experimental data; metabolic pathway data; taxonomic data; literature citations; and a suite of visualization and analysis tools. The results of these programs and searches assembled by the annotation pipeline are used to propose biological features that are also stored in the curation database that uses the Genomics Unified Schema (GUS). For the purposes of defining minimal, non-redundant set of genes characteristic of the category, one genome (usually the best-known or best-characterized) is identified as the "reference genome"; the remaining members of the class are called "associated genomes." For example, the Tor2 and Urbani isolates were the first two SARS coronavirus genomes to be sequenced and therefore were named as reference genomes. This allows high-value, manually curated information from the corresponding reference genes to be automatically linked to the associated genes, provided minimal similarity criteria based on automated sequence analysis are satisfied. cache = ./cache/cord-301709-kvyes2lz.txt txt = ./txt/cord-301709-kvyes2lz.txt === reduce.pl bib === id = cord-298136-mel9fxw8 author = O'Malley, Maureen A. title = Whole-genome patenting date = 2005-05-10 pages = extension = .txt mime = text/plain words = 4106 sentences = 189 flesch = 44 summary = Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. However, further analysis reveals that patent specifications describing whole-genome inventions use arguments that imply that genomes are qualitatively different from individual genes. This standard allows several sub-inventions to be linked together by a common "general inventive concept", but prevents unrelated inventions from succeeding as a single Abstract | Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. If there are any qualitative differences between patents for whole genomes and those for DNA fragments, it seems likely that they will be found in the utility arguments -the most contested feature of recent gene patenting. cache = ./cache/cord-298136-mel9fxw8.txt txt = ./txt/cord-298136-mel9fxw8.txt === reduce.pl bib === id = cord-268795-tjmx6msm author = Sardar, Rahila title = Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date = 2020-03-21 pages = extension = .txt mime = text/plain words = 2257 sentences = 128 flesch = 47 summary = title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Our analysis shows unique host-miRNAs targeting SARS-CoV2 virus genes. CELLO2GO (7)server was used to infer biological function for each protein of SARS-CoV2 genome with their localization prediction. Assembled SARS-CoV2 genomes sequences in FASTA format from India, USA, China, Italy and Nepal used for coronavirus typing tool analysis. For the phylogenetic analysis, we compared the sequences of 6 SARS-CoV2 isolates from different countries namely, Wuhan, India, Italy, USA and Nepal along with other corona virus species ( Figure 1 ). cache = ./cache/cord-268795-tjmx6msm.txt txt = ./txt/cord-268795-tjmx6msm.txt === reduce.pl bib === id = cord-003316-r5te5xob author = Balloux, Francois title = From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic date = 2018-12-17 pages = extension = .txt mime = text/plain words = 7340 sentences = 327 flesch = 34 summary = WGS-based strain identification gives a far superior resolution In principle, WGS can provide highly relevant information for clinical microbiology in near-real-time, from phenotype testing to tracking outbreaks. As an example, genome assembly might appear to be a bottleneck for real-time WGS diagnostics, but is probably rarely required; sufficient characterization of an isolate can be made by analysis of the k-mers in the raw sequence data, which is orders of magnitude faster. These include, among others: the current costs of WGS, which remain far from negligible despite a common belief that sequencing costs have plummeted; a lack of training in, and possible cultural resistance to, bioinformatics among clinical microbiologists; a lack of the necessary computational infrastructure in most hospitals; the inadequacy of existing reference microbial genomics databases necessary for reliable AMR and virulence profiling; and the difficulty of setting up effective, standardized, and accredited bioinformatics protocols. cache = ./cache/cord-003316-r5te5xob.txt txt = ./txt/cord-003316-r5te5xob.txt === reduce.pl bib === id = cord-277687-u3q36o3e author = Shean, Ryan C. title = VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank date = 2019-01-23 pages = extension = .txt mime = text/plain words = 4071 sentences = 212 flesch = 49 summary = title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank In order to accept submitted viral genomic data, NCBI GenBank requires 1) viral sequence complete with at least one protein annotation, 2) author/depositor metadata, and 3) viral sequence metadata, such as strain, collection date, collection location, and coverage. VAPiD handles batch submissions of multiple viruses of different types without prior knowledge of the viral species, correctly annotates RNA editing and ribosomal slippage, performs spellchecking on annotations, handles batch or individual submission of metadata, runs with a simple one-line command, and creates annotated viral sequence files for GenBank submission. This first example is the task that the authors originally wrote VAPiD for -annotating large numbers of genomes from different viral species, which mirrors the type of data that many clinical and public health laboratories may encounter. cache = ./cache/cord-277687-u3q36o3e.txt txt = ./txt/cord-277687-u3q36o3e.txt === reduce.pl bib === id = cord-348515-bqqyly23 author = Zhao, Suhui title = Re-emergent Human Adenovirus Genome Type 7d Caused an Acute Respiratory Disease Outbreak in Southern China After a Twenty-one Year Absence date = 2014-12-08 pages = extension = .txt mime = text/plain words = 6569 sentences = 323 flesch = 43 summary = Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Thorough characterization of these pathogens is evidenced by the availability of two genome sequences (JF800905 and JX625134), both of which are further identified as the HAdV-7d genome type in this report, and shown to be nearly identical to this report of an isolate from a 2011 ARD outbreak in Guangdong Province (strain DG01_2011) by comparative genomics and, in particular, in silico REA pattern analysis, as presented in Figure 2 . cache = ./cache/cord-348515-bqqyly23.txt txt = ./txt/cord-348515-bqqyly23.txt === reduce.pl bib === id = cord-320005-i30t7cvr author = Pardo, A. title = The Human Genome and Advances in Medicine: Limits and Future Prospects date = 2004-03-31 pages = extension = .txt mime = text/plain words = 4919 sentences = 211 flesch = 49 summary = The HGP's initial objectives were fulfilled 2 years ahead of schedule, and, in addition to compiling a highly accurate sequence of the human genome which has been made freely available and accessible to everyone, the Consortium has developed a set of new technologies and has constructed genetic maps of the genomes of various organisms. Around the same time, the public consortium known as the Human Genome Project was formed, and this organization announced a 15-year plan (from 1990 to 2005) with the following objectives: a) to determine the complete nucleotide sequence of human DNA and identify all the genes in human DNA (estimated to number between 50 000 and 100 000); b) to build physical and genetic maps; c) to analyze the genomes of selected organisms used in research as model systems (eg, the mouse); d) to develop new technologies; and e) to analyze and debate the ethical and legal implications for individuals and for society as a whole. cache = ./cache/cord-320005-i30t7cvr.txt txt = ./txt/cord-320005-i30t7cvr.txt === reduce.pl bib === id = cord-350747-5t5xthk6 author = Gmyl, A. P. title = Diverse Mechanisms of RNA Recombination date = 2005 pages = extension = .txt mime = text/plain words = 8187 sentences = 463 flesch = 41 summary = It was believed until recently that the only possible mechanism of RNA recombination is replicative template switching, with synthesis of a complementary strand starting on one viral RNA molecule and being completed on another. An illustrative example of deletions is provided by defective interfering (DI) genomes, which accumulate in a virus population upon high-multiplicity infections and lack a fragment of the sequence coding for viral proteins [5] [6] [7] . A special role in the variation of RNA viruses is played by recombination, the generation of new genomes from two or more parental RNAs. Recombination between viral RNA molecules was observed for the first time as early as in the 1960s in the poliovirus [14, 15] . In other words, it is possible to assume that some of the mechanisms of nonreplicative RNA recombination play an important role in the evolution of not only viral, but also cell genomes [51, 90] . cache = ./cache/cord-350747-5t5xthk6.txt txt = ./txt/cord-350747-5t5xthk6.txt === reduce.pl bib === id = cord-302047-vv5gpldi author = Willemsen, Anouk title = On the stability of sequences inserted into viral genomes date = 2019-11-14 pages = extension = .txt mime = text/plain words = 12557 sentences = 598 flesch = 43 summary = Viruses are widely used as vectors for heterologous gene expression in cultured cells or natural hosts, and therefore a large number of viruses with exogenous sequences inserted into their genomes have been engineered. Viruses genera covered in relevant studies Conclusions of this review All viruses • Inserted sequences are often unstable and rapidly lost upon passaging of an engineered virus • The position at which a sequence is integrated in the genome can be important for stability • Sequence stability is not an intrinsic property of genomes because demographic parameters, such as population size and bottleneck size, can have important effects on sequence stability • The multiplicity of cellular infection affects sequence stability, and can in some cases directly affect whether there is selection for deletion variants • Deletions are not the only class of mutations that can reduce the cost of inserted sequences, although they are the most common I: dsDNA cache = ./cache/cord-302047-vv5gpldi.txt txt = ./txt/cord-302047-vv5gpldi.txt === reduce.pl bib === id = cord-310406-5pvln91x author = Asbury, Thomas M title = Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome date = 2010-09-02 pages = extension = .txt mime = text/plain words = 3014 sentences = 189 flesch = 44 summary = RESULTS: We have applied object-oriented technology to develop a downloadable visualization tool, Genome3D, for integrating and displaying epigenomic data within a prescribed three-dimensional physical model of the human genome. In addition, in spite of the many recent efforts to measure and model the genome structure at various resolutions and detail [3] [4] [5] [6] [7] [8] [9] [10] , little work has focused on combining these models into a plausible aggregate, or has taken advantage of the large amount of genomic and epigenomic data available from new high-throughput approaches. The viewer is designed to display data from multiple scales and uses a hierarchical model of the relative positions of all nucleotide atoms in the cell nucleus, i.e., the complete physical genome. An integrated physical genome model can show the interplay between histone modifications and other genomic data, such as SNPs, DNA methylation, the structure of gene, promoter and transcription machinery, etc. In addition to epigenomic data, the physical genome model also provides a platform to visualize highthroughput gene expression data and its interplay with global binding information of transcription factors. cache = ./cache/cord-310406-5pvln91x.txt txt = ./txt/cord-310406-5pvln91x.txt === reduce.pl bib === id = cord-015850-ef6svn8f author = Saitou, Naruya title = Eukaryote Genomes date = 2013-08-22 pages = extension = .txt mime = text/plain words = 7424 sentences = 484 flesch = 53 summary = General overviews of eukaryote genomes are first discussed, including organelle genomes, introns, and junk DNAs. We then discuss the evolutionary features of eukaryote genomes, such as genome duplication, C-value paradox, and the relationship between genome size and mutation rates. Most of the protein coding genes of melon mitochondrial DNAs are highly similar to those of its congeneric species, which are watermelon and squash whose mitochondrial genome sizes are 119 kb and 125 kb, respectively. There are various genomic features that are specifi c to eukaryotes other than existence of introns and junk DNAs, such as genome duplication, RNA editing, C-value paradox, and the relationship between genome size and mutation rates. The Perigord black truffl e ( Tuber melanosporum ), shown as A i n Fig. 8.9 , has the largest genome size (~125 Mb) among the 88 fungi species whose genome sequences were so far determined, yet the number of genes is only ~7,500 [ 81 ] . cache = ./cache/cord-015850-ef6svn8f.txt txt = ./txt/cord-015850-ef6svn8f.txt === reduce.pl bib === id = cord-340423-f8ab7413 author = Barr, J.N. title = Genetic Instability of RNA Viruses date = 2016-09-09 pages = extension = .txt mime = text/plain words = 9777 sentences = 454 flesch = 45 summary = We then discuss evidence that at least some RNA viruses have a replication fidelity that is poised to maximize genome sequence space without incurring catastrophic lethal mutations and describe how this can be exploited to control viral infections. The error-prone nature of polymerase activity, coupled with the absence of a proofreading mechanism, is the key reason why RNA virus genomes acquire mutations and exist as a swarm of genetic variants. The mutation rate of the viral polymerase, coupled with the replication mode that the virus employs (and extrinsic factors, described in the following text) will determine the extent of genetic variability of viruses released from an infected cell. Thus, it is possible that the high mutation rates of RNA viruses are simply a consequence of polymerases that are under selective pressure to replicate genomes very rapidly to ensure efficient viral infection [79] [80] [81] . cache = ./cache/cord-340423-f8ab7413.txt txt = ./txt/cord-340423-f8ab7413.txt === reduce.pl bib === id = cord-275683-1qj9ri18 author = Roux, Simon title = Metagenomics in Virology date = 2019-06-12 pages = extension = .txt mime = text/plain words = 5891 sentences = 225 flesch = 37 summary = Against the background of an extensive viral diversity revealed by metagenomics across many environments, new sequence assembly approaches that reconstruct complete genome sequences from metagenomes have recently revealed surprisingly cosmopolitan viruses in specific ecological niches. However, these techniques can only detect previously known viruses, and often require Box 1 Use of complementary methods to target different types of viruses A number of approaches have been developed to specifically select and survey the genetic material contained by virus particles in a given sample. Virus sequences obtained from "bulk" metagenomes will typically reflect viruses infecting their host cell at the time of sampling, either actively replicating or not, while viromes enables a deeper and more focused exploration of the virus diversity in a specific site or sample. With viral metagenomics being applied to a larger set of samples and environments, and with bioinformatic analyses including genome assembly and interpretation constantly improving, novel groups of dominant and widespread viruses may thus be progressively revealed across many environments. cache = ./cache/cord-275683-1qj9ri18.txt txt = ./txt/cord-275683-1qj9ri18.txt === reduce.pl bib === id = cord-330312-1pjolkql author = Liu, Y.-T. title = Infectious Disease Genomics date = 2017-01-20 pages = extension = .txt mime = text/plain words = 5168 sentences = 327 flesch = 45 summary = One of the important motivations for these efforts is to develop preventative, diagnostic, and therapeutic strategies through the analysis of sequenced microorganisms, parasites, and vectors related to human health. 16, 17 The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002. 30e32 Genome-sequencing projects for other important human disease vectors are in progress. 38 One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. 48 The completed or ongoing genome projects (Table 10 .1) provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. Genome sequence of the human malaria parasite Plasmodium falciparum cache = ./cache/cord-330312-1pjolkql.txt txt = ./txt/cord-330312-1pjolkql.txt === reduce.pl bib === === reduce.pl bib === id = cord-314594-xvc8hvpq author = Singh, Roshan Kumar title = Breeding and biotechnological interventions for trait improvement: status and prospects date = 2020-09-18 pages = extension = .txt mime = text/plain words = 9529 sentences = 472 flesch = 39 summary = Advances in high-throughput genomics strategies at a whole-genome level, including genetic association mapping, map-based cloning, genomic selection, and speed breeding, are also proven useful in improvising genetic gains for expediting the crop improvement processes. Through genome-wide association study (GWAS), 60 loci significantly associated with agronomic traits such as oil content, seed quality, stress tolerance were identified, which may be proven as a valuable resource for genetic improvement (Lu et al. Marker-assisted backcrossing (MABC) is the introgression of a genomic region (QTL or locus or gene) contributing the desired trait from a donor genotype into a breeding line or elite cultivar without linkage drag through backcrossing after multiple generations. As the name suggests, CRISPR/Cas9 consists of two components: a single-guide Application of functional and comparative genomics in marker-assisted breeding and biotechnological approaches for crop improvement. The candidate gene(s) identified from functional genomic studies can be introduced through genetic engineering or tar-geted modify through genome editing technology in crop species for improved agronomic traits. cache = ./cache/cord-314594-xvc8hvpq.txt txt = ./txt/cord-314594-xvc8hvpq.txt === reduce.pl bib === id = cord-304498-ty41xob0 author = Denison, Mark R title = Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity date = 2011-03-01 pages = extension = .txt mime = text/plain words = 7332 sentences = 345 flesch = 38 summary = Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. The high mutation rates of RNA viruses also render them particularly susceptible to repeated genetic bottleneck events during replication, transmission between hosts or spread within a host, resulting in progressive deviation from the consensus sequence associated with decreased viral fitness and sometimes extinction. cache = ./cache/cord-304498-ty41xob0.txt txt = ./txt/cord-304498-ty41xob0.txt === reduce.pl bib === id = cord-346335-el45v0a5 author = Tan, H.S. title = Fourier spectral density of the coronavirus genome date = 2020-08-11 pages = extension = .txt mime = text/plain words = 4646 sentences = 224 flesch = 56 summary = We uncover an interesting, new scaling law for the coronavirus genome: the complexity of the genome scales linearly with the power-law exponent that characterizes the enveloping curve of the low-frequency domain of the spectral density. An example of a seminal paper in this subject is that of Voss in [2] where the author found that the spectral density of the genome of many different species follows a power law of the form 1/k β in the low-frequency domain, with the exponent β potentially related to the organism's evolutionary category. We develop a few models to characterize the typical spectrum, and in the process stumble upon a linear scaling law between a measure of the complexity of each genome and the power-law exponent that describes the enveloping curve of the low-frequency domain. cache = ./cache/cord-346335-el45v0a5.txt txt = ./txt/cord-346335-el45v0a5.txt === reduce.pl bib === id = cord-318392-r9bbomvk author = Woo, Patrick CY title = Coronavirus HKU15 in respiratory tract of pigs and first discovery of coronavirus quasispecies in 5′-untranslated region date = 2017-06-21 pages = extension = .txt mime = text/plain words = 3771 sentences = 213 flesch = 56 summary = The genomes of two Coronavirus HKU15 strains detected in the nasopharyngeal samples of two different pigs were sequenced following our previous publications 26, 27 with modifications. Divergence times for the Coronavirus HKU15 strains were calculated based on the complete genome sequence data, utilizing the Bayesian Markov chain Monte Carlo method using BEAST 1.8.0 33 with the substitution model GTR (general time-reversible model)+G (gammadistributed rate variation)+I (estimated proportion of invariable sites), a strict molecular clock, and a constant coalescent. In one (S579N) of the two Coronavirus HKU15 genomes that we sequenced in this study, variant sites were observed at four positions; two of them were due to nucleotide substitutions, and the other two were results of indels at mononucleotide polymeric regions (189th and 376th bases). cache = ./cache/cord-318392-r9bbomvk.txt txt = ./txt/cord-318392-r9bbomvk.txt === reduce.pl bib === id = cord-334394-qgyzk7th author = Edgar, Robert C. title = Petabase-scale sequence alignment catalyses viral discovery date = 2020-08-10 pages = extension = .txt mime = text/plain words = 8134 sentences = 423 flesch = 51 summary = To address the ongoing pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (CoV) and other viral families to 5.6 petabases of public sequencing data from 3.8 million biologically diverse samples. To expand the known repertoire of viruses and catalyse global virus discovery, in particular for Coronaviridae (CoV) family, we developed the Serratus cloud computing architecture for ultra-high throughput sequence alignment. We aligned 3,837,755 public RNA-seq, meta-genome, meta-virome and meta-transcriptome datasets (termed a sequencing run [5] ) against a collection of viral family pangenomes comprising all GenBank CoV records clustered at 99% identity plus all non-retroviral RefSeq records for vertebrate viruses (see Methods and Extended Table 1 ). We performed de novo assembly on 52,772 runs potentially containing CoV sequencing reads by combining 37,131 SRA accessions identified by the Serratus search with 18,584 identified by an ongoing cataloguing initiative of the SRA called STAT [5] . cache = ./cache/cord-334394-qgyzk7th.txt txt = ./txt/cord-334394-qgyzk7th.txt === reduce.pl bib === id = cord-352619-s2x53grh author = Payne, Natalie title = Novel Circoviruses Detected in Feces of Sonoran Felids date = 2020-09-15 pages = extension = .txt mime = text/plain words = 3263 sentences = 177 flesch = 45 summary = Genomes from several families of circular Rep-encoding single-stranded DNA viruses (CRESS-DNA viruses) are part of the phylum Cressdnaviricota [22] and have been identified in fecal samples of other mammals, including domestic cats [23, 24] , bobcats, African lions [25] , capybaras [26] , and Tasmanian devils [27] . Here we used a metagenomic approach to identify novel circoviruses in the feces of two species of Sonoran felids, the puma and bobcat; although not endangered, knowledge of viral threats facing these species could help prevent future population decline, as well as indicate potential threats to the endangered ocelot and jaguar. Based on the species-demarcation threshold for circoviruses which is 80% genome-wide identity [28] , both of these belong to a new species which we refer to as Sonfela (derived from Sonoran felid associated) circovirus 1. As the viral genomes were derived from scat samples, the circoviruses could have infected the bobcat prey species or the felids themselves or be environmentally derived. cache = ./cache/cord-352619-s2x53grh.txt txt = ./txt/cord-352619-s2x53grh.txt === reduce.pl bib === id = cord-316033-xg8eb2nm author = Easton, Alice title = Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans date = 2020-11-06 pages = extension = .txt mime = text/plain words = 10447 sentences = 522 flesch = 45 summary = suum transcripts (Jex et al., 2011; Wang et al., 2017) to the human Ascaris germline assembly to annotate the genome, identifying and classifying 17,902 protein-coding genes ( Table 1 , Supplementary file 1). As this reference-based assembly exhibits the best assembly attributes, including high continuity with a large N50, low gaps and unplaced sequences, and high-quality protein-coding genes (see Table 1 ), we suggest that this version should be used as a reference germline genome for a human Ascaris spp. We next took advantage of the abundant reads from the mitochondrial genome in our sequencing data (on average 7690X coverage, see Supplementary file 1) to perform de novo assembly of 68 complete human Ascaris spp. Furthermore, there were no significant associations between mitochondrial sequence variations and other factors (e.g. village, household, time of worm collection, host) based on PERMANOVA (see methods and Table 2 ) after translating the phylogenetic tree into a distance matrix, suggesting not only a lack of differentiation into distinct species but also a potentially large interbreeding population of worms being transmitted between individuals and across villages. cache = ./cache/cord-316033-xg8eb2nm.txt txt = ./txt/cord-316033-xg8eb2nm.txt === reduce.pl bib === id = cord-022262-ck2lhojz author = Gromeier, Matthias title = Genetics, Pathogenesis and Evolution of Picornaviruses date = 2007-09-02 pages = extension = .txt mime = text/plain words = 28035 sentences = 1423 flesch = 46 summary = The following viruses have been recognized as picornaviruses on the basis of their genome sequences and physico-chemical properties as well as the result of comparative sequence analyses (see the section on Evolution): equine rhinovirus types I and 2, Aichi virus, porcine enterovirus, avian encephalomyelitis virus, infectious flacherie virus of silkworm Clusters of enteroviruses refer to groups of enteroviruses arranged predominantly according to genotypic kinship (Hyypia et al., 1997) . Briefly, when expression vectors ( Figure 12 .6E) consisting of a gag gene (encoding p17-p24; 1161 nt) of human immunodeficiency virus that was fused to the N-terminus of the poliovirus polyprotein (Andino et al., 1994; Mueller and Wimmer, 1998) were analysed after transfection into HeLa cells, the genomes were not only found to be severely impaired in viral replication but they were also genetically unstable (Mueller and Wimmer, 1997) . cache = ./cache/cord-022262-ck2lhojz.txt txt = ./txt/cord-022262-ck2lhojz.txt === reduce.pl bib === id = cord-297669-22fctxk4 author = Proudfoot, Chris title = Genome editing for disease resistance in pigs and chickens date = 2019-06-25 pages = extension = .txt mime = text/plain words = 4555 sentences = 237 flesch = 44 summary = The virus was thought to attach to CD169 to be taken up into the cells; however, genome-edited pigs lacking CD169 were not resistant to PRRSV infection (Prather et al., 2013) . Chicken somatic cell lines have been edited to introduce changes to this gene-conferring resistance to avian leucosis virus in vitro (Lee et al., 2017) . However, as the example for avian influenza shows, host genes play an important role in other steps of the pathogen replication cycle and also provide editing targets for disease resilience or resistance. Genome editing allows integration of the disease-resistance trait into a wider selection of pigs, ensuring genetic variability and maintenance of desirable traits. (D) Resistance genes may be identified in laboratory research but not in highly bred lines, making integration into those productive animals only possible using genome editing. She employs genome editing and genetic selection to generate animals genetically resistant to viral disease. cache = ./cache/cord-297669-22fctxk4.txt txt = ./txt/cord-297669-22fctxk4.txt ===== Reducing email addresses cord-012473-p66of6kq cord-265857-fs6dj3dp cord-277687-u3q36o3e cord-302047-vv5gpldi cord-297669-22fctxk4 Creating transaction Updating adr table ===== Reducing keywords cord-000012-p56v8wi1 cord-000556-uu1oz2ei cord-004123-1s8kuno2 cord-018437-yjvwa1ot cord-012473-p66of6kq cord-018804-wj35q88f cord-264746-gfn312aa cord-016588-f8uvhstb cord-014461-2ubh9u8r cord-267714-ji88tvsl cord-265329-bsypo08l cord-269124-oreg7rnj cord-265581-pbv8mjfc cord-265857-fs6dj3dp cord-001340-kqcx7lrq cord-000902-ew8orn0z cord-324811-yjwavea5 cord-007923-j3jpqd7k cord-017932-vmtjc8ct cord-005281-wy0zk9p8 cord-348059-wa1gjbck cord-281959-g4sjyytr cord-304607-td0776wj cord-016798-tv2ntug6 cord-016293-pyb00pt5 cord-301709-kvyes2lz cord-298136-mel9fxw8 cord-268795-tjmx6msm cord-003316-r5te5xob cord-277687-u3q36o3e cord-348515-bqqyly23 cord-320005-i30t7cvr cord-302047-vv5gpldi cord-350747-5t5xthk6 cord-015850-ef6svn8f cord-310406-5pvln91x cord-340423-f8ab7413 cord-275683-1qj9ri18 cord-330312-1pjolkql cord-297669-22fctxk4 cord-022128-r8el8nqm cord-304498-ty41xob0 cord-314594-xvc8hvpq cord-346335-el45v0a5 cord-318392-r9bbomvk cord-334394-qgyzk7th cord-316033-xg8eb2nm cord-352619-s2x53grh cord-022262-ck2lhojz Creating transaction Updating wrd table ===== Reducing urls cord-000012-p56v8wi1 cord-000556-uu1oz2ei cord-004123-1s8kuno2 cord-012473-p66of6kq cord-264746-gfn312aa cord-265329-bsypo08l cord-016588-f8uvhstb cord-269124-oreg7rnj cord-265581-pbv8mjfc cord-000902-ew8orn0z cord-007923-j3jpqd7k cord-017932-vmtjc8ct cord-304607-td0776wj cord-016798-tv2ntug6 cord-301709-kvyes2lz cord-003316-r5te5xob cord-277687-u3q36o3e cord-348515-bqqyly23 cord-302047-vv5gpldi cord-015850-ef6svn8f cord-310406-5pvln91x cord-346335-el45v0a5 cord-334394-qgyzk7th cord-316033-xg8eb2nm cord-352619-s2x53grh Creating transaction Updating url table ===== Reducing named entities cord-000556-uu1oz2ei cord-000012-p56v8wi1 cord-004123-1s8kuno2 cord-012473-p66of6kq cord-018804-wj35q88f cord-264746-gfn312aa cord-018437-yjvwa1ot cord-016588-f8uvhstb cord-014461-2ubh9u8r cord-267714-ji88tvsl cord-265329-bsypo08l cord-269124-oreg7rnj cord-265581-pbv8mjfc cord-265857-fs6dj3dp cord-001340-kqcx7lrq cord-007923-j3jpqd7k cord-000902-ew8orn0z cord-005281-wy0zk9p8 cord-324811-yjwavea5 cord-017932-vmtjc8ct cord-348059-wa1gjbck cord-304607-td0776wj cord-281959-g4sjyytr cord-016798-tv2ntug6 cord-301709-kvyes2lz cord-298136-mel9fxw8 cord-268795-tjmx6msm cord-003316-r5te5xob cord-277687-u3q36o3e cord-348515-bqqyly23 cord-320005-i30t7cvr cord-302047-vv5gpldi cord-350747-5t5xthk6 cord-015850-ef6svn8f cord-310406-5pvln91x cord-340423-f8ab7413 cord-275683-1qj9ri18 cord-330312-1pjolkql cord-297669-22fctxk4 cord-314594-xvc8hvpq cord-346335-el45v0a5 cord-022128-r8el8nqm cord-304498-ty41xob0 cord-318392-r9bbomvk cord-352619-s2x53grh cord-334394-qgyzk7th cord-316033-xg8eb2nm cord-016293-pyb00pt5 cord-022262-ck2lhojz Creating transaction Updating ent table ===== Reducing parts of speech cord-000556-uu1oz2ei cord-012473-p66of6kq cord-004123-1s8kuno2 cord-267714-ji88tvsl cord-007923-j3jpqd7k cord-014461-2ubh9u8r cord-265581-pbv8mjfc cord-000012-p56v8wi1 cord-018804-wj35q88f cord-265329-bsypo08l cord-001340-kqcx7lrq cord-016588-f8uvhstb cord-000902-ew8orn0z cord-265857-fs6dj3dp cord-005281-wy0zk9p8 cord-348059-wa1gjbck cord-018437-yjvwa1ot cord-264746-gfn312aa cord-324811-yjwavea5 cord-017932-vmtjc8ct cord-281959-g4sjyytr cord-304607-td0776wj cord-269124-oreg7rnj cord-016798-tv2ntug6 cord-301709-kvyes2lz cord-298136-mel9fxw8 cord-268795-tjmx6msm cord-277687-u3q36o3e cord-003316-r5te5xob cord-320005-i30t7cvr cord-348515-bqqyly23 cord-310406-5pvln91x cord-330312-1pjolkql cord-297669-22fctxk4 cord-350747-5t5xthk6 cord-015850-ef6svn8f cord-275683-1qj9ri18 cord-318392-r9bbomvk cord-346335-el45v0a5 cord-302047-vv5gpldi cord-340423-f8ab7413 cord-352619-s2x53grh cord-304498-ty41xob0 cord-314594-xvc8hvpq cord-316033-xg8eb2nm cord-334394-qgyzk7th cord-022128-r8el8nqm cord-016293-pyb00pt5 cord-022262-ck2lhojz Creating transaction Updating pos table Building ./etc/reader.txt cord-022262-ck2lhojz cord-014461-2ubh9u8r cord-022128-r8el8nqm cord-014461-2ubh9u8r cord-340423-f8ab7413 cord-022128-r8el8nqm number of items: 49 sum of words: 303,485 average size in words: 6,457 average readability score: 45 nouns: genome; virus; sequence; viruses; genomes; genes; gene; sequences; dna; data; analysis; protein; host; species; proteins; recombination; sequencing; replication; disease; number; cells; poliovirus; cell; evolution; mutation; mutations; strains; example; type; information; diversity; time; population; expression; studies; size; infection; regions; rate; research; structure; study; identification; selection; region; reference; polymerase; approach; strain; pathogen verbs: using; identify; includes; based; provide; show; found; containing; associated; known; developed; sequenced; suggests; causes; made; require; revealed; generate; lead; encode; producing; allowed; occur; determine; resulted; increasing; followed; predicted; related; given; expressed; involved; see; described; isolated; considered; compared; detecting; coding; represents; infected; targeted; emerged; indicate; remains; became; performed; studied; appeared; affecting adjectives: viral; human; genetic; high; genomic; new; different; single; many; large; molecular; specific; important; complete; whole; nucleotide; first; non; evolutionary; multiple; infectious; several; small; possible; similar; clinical; available; bacterial; cellular; novel; low; functional; microbial; biological; major; common; present; respiratory; phylogenetic; structural; natural; additional; immune; comparative; unique; like; wide; complex; recent; long adverbs: also; however; well; highly; even; often; therefore; now; currently; still; recently; previously; respectively; rapidly; relatively; rather; closely; already; probably; much; first; far; directly; usually; yet; less; interestingly; generally; genetically; approximately; together; indeed; particularly; moreover; finally; especially; widely; newly; subsequently; significantly; furthermore; typically; long; likely; just; completely; later; specifically; potentially; least pronouns: it; we; their; its; they; our; them; his; i; he; us; one; itself; themselves; you; him; p~; her; your; she; himself; u; ourselves; my; https://github.com/ababaian/serratus; mine; https://serratus.io; hadv-4; coronaspades proper nouns: RNA; Genome; SARS; DNA; C; Human; Fig; Virus; China; GenBank; NCBI; CoV-2; PCR; SNP; kb; B; A; Complete; CoV; Yersinia; Coronavirus; WGS; HIV-1; T; Strain; C.; Y.; T.; Project; bp; Wimmer; National; Institute; ExoN; E.; S.; HIV; Figure; Table; picornavirus; IRES; Treponema; S; SNPs; NIAID; Europe; HGP; •; Gene; L. keywords: genome; rna; dna; virus; sequence; gene; human; viral; sars; protein; mutation; sequencing; recombination; pathogen; hgp; disease; yersinia; wimmer; wgs; venter; university; u.s.; treponema; trait; tool; technology; subsp; strain; stability; ssr; sra; spike; snp; serratus; seq; rep; rea; qtl; product; probe; poliovirus; plant; pig; pestis; pcr; patent; pan; pallidum; nih; niaid one topic; one dimension: genome file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262788/ titles(s): RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” three topics; one dimension: genome; genome; genome file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7155501/, https://doi.org/10.1038/s41576-019-0119-1, https://www.ncbi.nlm.nih.gov/pubmed/33155980/ titles(s): Genetics, Pathogenesis and Evolution of Picornaviruses | Ancient pathogen genomics as an emerging tool for infectious disease research | Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans five topics; three dimensions: genome sequence virus; rna virus viruses; genome pallidum human; genome data sequence; data ancient pestis file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120537/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7155501/, https://www.ncbi.nlm.nih.gov/pubmed/33155980/, https://doi.org/10.1186/1471-2105-10-293, https://doi.org/10.1038/s41576-019-0119-1 titles(s): The flowering of the age of Biotechnology 1990–2000 | Genetics, Pathogenesis and Evolution of Picornaviruses | Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans | Efficient oligonucleotide probe selection for pan-genomic tiling arrays | Ancient pathogen genomics as an emerging tool for infectious disease research Type: cord title: keyword-genome-cord date: 2021-05-24 time: 23:51 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: keywords:genome ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: cord-310406-5pvln91x author: Asbury, Thomas M title: Genome3D: A viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome date: 2010-09-02 words: 3014.0 sentences: 189.0 pages: flesch: 44.0 cache: ./cache/cord-310406-5pvln91x.txt txt: ./txt/cord-310406-5pvln91x.txt summary: RESULTS: We have applied object-oriented technology to develop a downloadable visualization tool, Genome3D, for integrating and displaying epigenomic data within a prescribed three-dimensional physical model of the human genome. In addition, in spite of the many recent efforts to measure and model the genome structure at various resolutions and detail [3] [4] [5] [6] [7] [8] [9] [10] , little work has focused on combining these models into a plausible aggregate, or has taken advantage of the large amount of genomic and epigenomic data available from new high-throughput approaches. The viewer is designed to display data from multiple scales and uses a hierarchical model of the relative positions of all nucleotide atoms in the cell nucleus, i.e., the complete physical genome. An integrated physical genome model can show the interplay between histone modifications and other genomic data, such as SNPs, DNA methylation, the structure of gene, promoter and transcription machinery, etc. In addition to epigenomic data, the physical genome model also provides a platform to visualize highthroughput gene expression data and its interplay with global binding information of transcription factors. abstract: BACKGROUND: New technologies are enabling the measurement of many types of genomic and epigenomic information at scales ranging from the atomic to nuclear. Much of this new data is increasingly structural in nature, and is often difficult to coordinate with other data sets. There is a legitimate need for integrating and visualizing these disparate data sets to reveal structural relationships not apparent when looking at these data in isolation. RESULTS: We have applied object-oriented technology to develop a downloadable visualization tool, Genome3D, for integrating and displaying epigenomic data within a prescribed three-dimensional physical model of the human genome. In order to integrate and visualize large volume of data, novel statistical and mathematical approaches have been developed to reduce the size of the data. To our knowledge, this is the first such tool developed that can visualize human genome in three-dimension. We describe here the major features of Genome3D and discuss our multi-scale data framework using a representative basic physical model. We then demonstrate many of the issues and benefits of multi-resolution data integration. CONCLUSIONS: Genome3D is a software visualization tool that explores a wide range of structural genomic and epigenetic data. Data from various sources of differing scales can be integrated within a hierarchical framework that is easily adapted to new developments concerning the structure of the physical genome. In addition, our tool has a simple annotation mechanism to incorporate non-structural information. Genome3D is unique is its ability to manipulate large amounts of multi-resolution data from diverse sources to uncover complex and new structural relationships within the genome. url: https://www.ncbi.nlm.nih.gov/pubmed/20813045/ doi: 10.1186/1471-2105-11-444 id: cord-301709-kvyes2lz author: Baker, Susan C. title: Developing Bioinformatic Resources for Coronaviruses date: 2006 words: 1017.0 sentences: 58.0 pages: flesch: 44.0 cache: ./cache/cord-301709-kvyes2lz.txt txt: ./txt/cord-301709-kvyes2lz.txt summary: The database will contain high-quality curated data: sequence annotations from published whole and partial genomes; relevant experimental data; metabolic pathway data; taxonomic data; literature citations; and a suite of visualization and analysis tools. The results of these programs and searches assembled by the annotation pipeline are used to propose biological features that are also stored in the curation database that uses the Genomics Unified Schema (GUS). For the purposes of defining minimal, non-redundant set of genes characteristic of the category, one genome (usually the best-known or best-characterized) is identified as the "reference genome"; the remaining members of the class are called "associated genomes." For example, the Tor2 and Urbani isolates were the first two SARS coronavirus genomes to be sequenced and therefore were named as reference genomes. This allows high-value, manually curated information from the corresponding reference genes to be automatically linked to the associated genes, provided minimal similarity criteria based on automated sequence analysis are satisfied. abstract: nan url: https://www.ncbi.nlm.nih.gov/pubmed/17037566/ doi: 10.1007/978-0-387-33012-9_70 id: cord-003316-r5te5xob author: Balloux, Francois title: From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic date: 2018-12-17 words: 7340.0 sentences: 327.0 pages: flesch: 34.0 cache: ./cache/cord-003316-r5te5xob.txt txt: ./txt/cord-003316-r5te5xob.txt summary: WGS-based strain identification gives a far superior resolution In principle, WGS can provide highly relevant information for clinical microbiology in near-real-time, from phenotype testing to tracking outbreaks. As an example, genome assembly might appear to be a bottleneck for real-time WGS diagnostics, but is probably rarely required; sufficient characterization of an isolate can be made by analysis of the k-mers in the raw sequence data, which is orders of magnitude faster. These include, among others: the current costs of WGS, which remain far from negligible despite a common belief that sequencing costs have plummeted; a lack of training in, and possible cultural resistance to, bioinformatics among clinical microbiologists; a lack of the necessary computational infrastructure in most hospitals; the inadequacy of existing reference microbial genomics databases necessary for reliable AMR and virulence profiling; and the difficulty of setting up effective, standardized, and accredited bioinformatics protocols. abstract: Hospitals worldwide are facing an increasing incidence of hard-to-treat infections. Limiting infections and providing patients with optimal drug regimens require timely strain identification as well as virulence and drug-resistance profiling. Additionally, prophylactic interventions based on the identification of environmental sources of recurrent infections (e.g., contaminated sinks) and reconstruction of transmission chains (i.e., who infected whom) could help to reduce the incidence of nosocomial infections. WGS could hold the key to solving these issues. However, uptake in the clinic has been slow. Some major scientific and logistical challenges need to be solved before WGS fulfils its potential in clinical microbial diagnostics. In this review we identify major bottlenecks that need to be resolved for WGS to routinely inform clinical intervention and discuss possible solutions. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249990/ doi: 10.1016/j.tim.2018.08.004 id: cord-340423-f8ab7413 author: Barr, J.N. title: Genetic Instability of RNA Viruses date: 2016-09-09 words: 9777.0 sentences: 454.0 pages: flesch: 45.0 cache: ./cache/cord-340423-f8ab7413.txt txt: ./txt/cord-340423-f8ab7413.txt summary: We then discuss evidence that at least some RNA viruses have a replication fidelity that is poised to maximize genome sequence space without incurring catastrophic lethal mutations and describe how this can be exploited to control viral infections. The error-prone nature of polymerase activity, coupled with the absence of a proofreading mechanism, is the key reason why RNA virus genomes acquire mutations and exist as a swarm of genetic variants. The mutation rate of the viral polymerase, coupled with the replication mode that the virus employs (and extrinsic factors, described in the following text) will determine the extent of genetic variability of viruses released from an infected cell. Thus, it is possible that the high mutation rates of RNA viruses are simply a consequence of polymerases that are under selective pressure to replicate genomes very rapidly to ensure efficient viral infection [79] [80] [81] . abstract: Despite having very limited coding capacity, RNA viruses are able to withstand challenge of antiviral drugs, cause epidemics in previously exposed human populations, and, in some cases, infect multiple host species. They are able to achieve this by virtue of their ability to multiply very rapidly, coupled with their extraordinary degree of genetic heterogeneity. RNA viruses exist not as single genotypes, but as a swarm of related variants, and this genomic diversity is an essential feature of their biology. RNA viruses have a variety of mechanisms that act in combination to determine their genetic heterogeneity. These include polymerase fidelity, error-mitigation mechanisms, genomic recombination, and different modes of genome replication. RNA viruses can vary in their ability to tolerate mutations, or “genetic robustness,” and several factors contribute to this. Finally, there is evidence that some RNA viruses exist close to a threshold where polymerase error rate has evolved to maximize the possible sequence space available, while avoiding the accumulation of a lethal load of deleterious mutations. We speculate that different viruses have evolved different error rates to complement the different “life-styles” they possess. url: https://www.sciencedirect.com/science/article/pii/B9780128033098000021 doi: 10.1016/b978-0-12-803309-8.00002-1 id: cord-000012-p56v8wi1 author: Bigot, Yves title: Molecular evidence for the evolution of ichnoviruses from ascoviruses by symbiogenesis date: 2008-09-18 words: 6419.0 sentences: 293.0 pages: flesch: 44.0 cache: ./cache/cord-000012-p56v8wi1.txt txt: ./txt/cord-000012-p56v8wi1.txt summary: CONCLUSION: Our results provide molecular evidence supporting the origin of ichnoviruses from ascoviruses by lateral transfer of ascoviral genes into ichneumonid wasp genomes, perhaps the first example of symbiogenesis between large DNA viruses and eukaryotic organisms. With respect to both species number and mechanisms that lead to successful parasitism, endoparasitic wasps are known to inject secretions at oviposition, but only a few lineages use viruses or virus-like particles (VLPs) to evade or to suppress host defences. Extending our investigations to proteins encoded by open reading frames of certain ascoviruses and bracoviruses, hosts and bacteria, in the light of recent analyses about the involvement of the replication machinery of virus groups related to ascoviruses in lateral gene transfer [29] , we discuss the robustness and the limits of the molecular evidence supporting an ascovirus origin for ichnovirus lineages. abstract: BACKGROUND: Female endoparasitic ichneumonid wasps inject virus-like particles into their caterpillar hosts to suppress immunity. These particles are classified as ichnovirus virions and resemble ascovirus virions, which are also transmitted by parasitic wasps and attack caterpillars. Ascoviruses replicate DNA and produce virions. Polydnavirus DNA consists of wasp DNA replicated by the wasp from its genome, which also directs particle synthesis. Structural similarities between ascovirus and ichnovirus particles and the biology of their transmission suggest that ichnoviruses evolved from ascoviruses, although molecular evidence for this hypothesis is lacking. RESULTS: Here we show that a family of unique pox-D5 NTPase proteins in the Glypta fumiferanae ichnovirus are related to three Diadromus pulchellus ascovirus proteins encoded by ORFs 90, 91 and 93. A new alignment technique also shows that two proteins from a related ichnovirus are orthologs of other ascovirus virion proteins. CONCLUSION: Our results provide molecular evidence supporting the origin of ichnoviruses from ascoviruses by lateral transfer of ascoviral genes into ichneumonid wasp genomes, perhaps the first example of symbiogenesis between large DNA viruses and eukaryotic organisms. We also discuss the limits of this evidence through complementary studies, which revealed that passive lateral transfer of viral genes among polydnaviral, bacterial, and wasp genomes may have occurred repeatedly through an intimate coupling of both recombination and replication of viral genomes during evolution. The impact of passive lateral transfers on evolutionary relationships between polydnaviruses and viruses with large double-stranded genomes is considered in the context of the theory of symbiogenesis. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2567993/ doi: 10.1186/1471-2148-8-253 id: cord-005281-wy0zk9p8 author: Blinov, V. M. title: Viral component of the human genome date: 2017-05-09 words: 6583.0 sentences: 306.0 pages: flesch: 44.0 cache: ./cache/cord-005281-wy0zk9p8.txt txt: ./txt/cord-005281-wy0zk9p8.txt summary: In the human genome, this capacity is determined by the portion of chromosomal DNA, which does not contain species-specific protein-encoding sequences and, thus, can basically make a place for novel information that will be modified to reach a new balance. In fact, the scope of the described phenomena is not limited to retroviruses as such, since the ubiquity of retroviral elements in animal genomes, their activity in germline cells [31] , along with the fact that viral replication depends significantly on RNA expression, allow retroviruses to contribute in different ways to the insertion of nonretroviral genes into animal germline cells. Finally, the ability to incorporate parts of the viral genome into the chromosomal DNA of host germline cells can vary strongly among different taxonomic groups of viruses, i.e., orders, families, genera, and even species If insertions of viral sequences remain functionally active in the host cell genome, they can give rise to either proteins that function in a new environment or untranslated RNAs of different sizes. abstract: Relationships between viruses and their human host are traditionally described from the point of view taking into consideration hosts as victims of viral aggression, which results in infectious diseases. However, these relations are in fact two-sided and involve modifications of both the virus and host genomes. Mutations that accumulate in the populations of viruses and hosts may provide them advantages such as the ability to overcome defense barriers of host cells or to create more efficient barriers to deal with the attack of the viral agent. One of the most common ways of reinforcing anti-viral barriers is the horizontal transfer of viral genes into the host genome. Within the host genome, these genes may be modified and extensively expressed to compete with viral copies and inhibit the synthesis of their products or modulate their functions in other ways. This review summarizes the available data on the horizontal gene transfer between viral and human genomes and discusses related problems. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7089383/ doi: 10.1134/s0026893317020066 id: cord-012473-p66of6kq author: Celniker, Susan E. title: Unlocking the secrets of the genome date: 2009-06-17 words: 2556.0 sentences: 119.0 pages: flesch: 37.0 cache: ./cache/cord-012473-p66of6kq.txt txt: ./txt/cord-012473-p66of6kq.txt summary: T he primary objective of the Human Genome Project was to produce highquality sequences not just for the human genome but also for those of the chief model organisms: Escherichia coli, yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogaster) and mouse (Mus musculus). Free access to the resultant data has prompted much biological research, including development of a map of common human genetic variants (the International HapMap Project) 1 , expression profiling of healthy and diseased cells 2 and in-depth studies of many individual genes. On the basis of this experience, the NHGRI launched two complementary programmes in 2007: an expansion of the human ENCODE project to the whole genome (www.genome.gov/ENCODE) and the model organism ENCODE (modENCODE) project to generate a comprehensive annotation of the functional elements in the C. The research communities that study these two organisms will rapidly make use of the modENCODE results, deploying powerful experimental approaches that are often not possible or practical in mammals, including genetic, genomic, transgenic, biochemical and RNAi assays. abstract: Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that. SUPPLEMENTARY INFORMATION: The online version of this article (doi:10.1038/459927a) contains supplementary material, which is available to authorized users. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843545/ doi: 10.1038/459927a id: cord-304498-ty41xob0 author: Denison, Mark R title: Coronaviruses: An RNA proofreading machine regulates replication fidelity and diversity date: 2011-03-01 words: 7332.0 sentences: 345.0 pages: flesch: 38.0 cache: ./cache/cord-304498-ty41xob0.txt txt: ./txt/cord-304498-ty41xob0.txt summary: Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. The high mutation rates of RNA viruses also render them particularly susceptible to repeated genetic bottleneck events during replication, transmission between hosts or spread within a host, resulting in progressive deviation from the consensus sequence associated with decreased viral fitness and sometimes extinction. abstract: In order to survive and propagate, RNA viruses must achieve a balance between the capacity for adaptation to new environmental conditions or host cells with the need to maintain an intact and replication competent genome. Several virus families in the order Nidovirales, such as the coronaviruses (CoVs) must achieve these objectives with the largest and most complex replicating RNA genomes known, up to 32 kb of positive-sense RNA. The CoVs encode sixteen nonstructural proteins (nsp 1–16) with known or predicted RNA synthesis and modification activities, and it has been proposed that they are also responsible for the evolution of large genomes. The CoVs, including murine hepatitis virus (MHV) and SARS-CoV, encode a 3′-to-5′ exoribonuclease activity (ExoN) in nsp14. Genetic inactivation of ExoN activity in engineered SARS-CoV and MHV genomes by alanine substitution at conserved DE-D-D active site residues results in viable mutants that demonstrate 15- to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other RNA viruses. Thus nsp14-ExoN is essential for replication fidelity, and likely serves either as a direct mediator or regulator of a more complex RNA proofreading machine, a process previously unprecedented in RNA virus biology. Elucidation of the mechanisms of nsp14-mediated proofreading will have major implications for our understanding of the evolution of RNA viruses, and also will provide a robust model to investigate the balance between fidelity, diversity and pathogenesis. The discovery of a protein distinct from a viral RdRp that regulates replication fidelity also raises the possibility that RNA genome replication fidelity may be adaptable to differing replication environments and selective pressures, rather than being a fixed determinant. url: https://www.ncbi.nlm.nih.gov/pubmed/21593585/ doi: 10.4161/rna.8.2.15013 id: cord-022128-r8el8nqm author: Domingo, Esteban title: Molecular basis of genetic variation of viruses: error-prone replication date: 2019-11-08 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Genetic variation is a necessity of all biological systems. Viruses use all known mechanisms of variation; mutation, several forms of recombination, and segment reassortment in the case of viruses with a segmented genome. These processes are intimately connected with the replicative machineries of viruses, as well as with fundamental physical-chemical properties of nucleotides when acting as template or substrate residues. Recombination has been viewed as a means to rescue viable genomes from unfit parents or to produce large modifications for the exploration of phenotypic novelty. All types of genetic variation can act conjointly as blind processes to provide the raw materials for adaptation to the changing environments in which viruses must replicate. A distinction is made between mechanistically unavoidable and evolutionarily relevant mutation and recombination. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7153327/ doi: 10.1016/b978-0-12-816331-3.00002-7 id: cord-316033-xg8eb2nm author: Easton, Alice title: Molecular evidence of hybridization between pig and human Ascaris indicates an interbred species complex infecting humans date: 2020-11-06 words: 10447.0 sentences: 522.0 pages: flesch: 45.0 cache: ./cache/cord-316033-xg8eb2nm.txt txt: ./txt/cord-316033-xg8eb2nm.txt summary: suum transcripts (Jex et al., 2011; Wang et al., 2017) to the human Ascaris germline assembly to annotate the genome, identifying and classifying 17,902 protein-coding genes ( Table 1 , Supplementary file 1). As this reference-based assembly exhibits the best assembly attributes, including high continuity with a large N50, low gaps and unplaced sequences, and high-quality protein-coding genes (see Table 1 ), we suggest that this version should be used as a reference germline genome for a human Ascaris spp. We next took advantage of the abundant reads from the mitochondrial genome in our sequencing data (on average 7690X coverage, see Supplementary file 1) to perform de novo assembly of 68 complete human Ascaris spp. Furthermore, there were no significant associations between mitochondrial sequence variations and other factors (e.g. village, household, time of worm collection, host) based on PERMANOVA (see methods and Table 2 ) after translating the phylogenetic tree into a distance matrix, suggesting not only a lack of differentiation into distinct species but also a potentially large interbreeding population of worms being transmitted between individuals and across villages. abstract: Human ascariasis is a major neglected tropical disease caused by the nematode Ascaris lumbricoides. We report a 296 megabase (Mb) reference-quality genome comprised of 17,902 protein-coding genes derived from a single, representative Ascaris worm. An additional 68 worms were collected from 60 human hosts in Kenyan villages where pig husbandry is rare. Notably, the majority of these worms (63/68) possessed mitochondrial genomes that clustered closer to the pig parasite Ascaris suum than to A. lumbricoides. Comparative phylogenomic analyses identified over 11 million nuclear-encoded SNPs but just two distinct genetic types that had recombined across the genomes analyzed. The nuclear genomes had extensive heterozygosity, and all samples existed as genetic mosaics with either A. suum-like or A. lumbricoides-like inheritance patterns supporting a highly interbred Ascaris species genetic complex. As no barriers appear to exist for anthroponotic transmission of these ‘hybrid’ worms, a one-health approach to control the spread of human ascariasis will be necessary. url: https://www.ncbi.nlm.nih.gov/pubmed/33155980/ doi: 10.7554/elife.61562 id: cord-334394-qgyzk7th author: Edgar, Robert C. title: Petabase-scale sequence alignment catalyses viral discovery date: 2020-08-10 words: 8134.0 sentences: 423.0 pages: flesch: 51.0 cache: ./cache/cord-334394-qgyzk7th.txt txt: ./txt/cord-334394-qgyzk7th.txt summary: To address the ongoing pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (CoV) and other viral families to 5.6 petabases of public sequencing data from 3.8 million biologically diverse samples. To expand the known repertoire of viruses and catalyse global virus discovery, in particular for Coronaviridae (CoV) family, we developed the Serratus cloud computing architecture for ultra-high throughput sequence alignment. We aligned 3,837,755 public RNA-seq, meta-genome, meta-virome and meta-transcriptome datasets (termed a sequencing run [5] ) against a collection of viral family pangenomes comprising all GenBank CoV records clustered at 99% identity plus all non-retroviral RefSeq records for vertebrate viruses (see Methods and Extended Table 1 ). We performed de novo assembly on 52,772 runs potentially containing CoV sequencing reads by combining 37,131 SRA accessions identified by the Serratus search with 18,584 identified by an ongoing cataloguing initiative of the SRA called STAT [5] . abstract: Public sequence data represents a major opportunity for viral discovery, but its exploration has been inhibited by a lack of efficient methods for searching this corpus, which is currently at the petabase scale and growing exponentially. To address the ongoing pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (CoV) and other viral families to 5.6 petabases of public sequencing data from 3.8 million biologically diverse samples. To implement this strategy, we developed a cloud computing architecture, Serratus, tailored for ultra-high throughput sequence alignment at the petabase scale. From this search, we identified and assembled thousands of CoV and CoV-like genomes and genome fragments ranging from known strains to putatively novel genera. We generalise this strategy to other viral families, identifying several novel deltaviruses and huge bacteriophages. To catalyse a new era of viral discovery we made millions of viral alignments and family identifications freely available to the research community. Expanding the known diversity and zoonotic reservoirs of CoV and other emerging pathogens can accelerate vaccine and therapeutic developments for the current pandemic, and help us anticipate and mitigate future ones. url: https://doi.org/10.1101/2020.08.07.241729 doi: 10.1101/2020.08.07.241729 id: cord-016798-tv2ntug6 author: Gautam, Ablesh title: Bioinformatics Applications in Advancing Animal Virus Research date: 2019-06-06 words: 6978.0 sentences: 405.0 pages: flesch: 44.0 cache: ./cache/cord-016798-tv2ntug6.txt txt: ./txt/cord-016798-tv2ntug6.txt summary: The chapter further provides information on the tools that can be used to study viral epidemiology, phylogenetic analysis, structural modelling of proteins, epitope recognition and open reading frame (ORF) recognition and tools that enable to analyse host-viral interactions, gene prediction in the viral genome, etc. This chapter will introduce virologists to some of the common as well virus-specific bioinformatics tools that the researches can use to analyse viral sequence data to elucidate the viral dynamics, evolution and preventive therapeutics. Novel virus types comprise of new CDSs that are different than previously known CDSs. There are multiple databases and tools available for analysis of human viruses; however, there are still only a limited number of resources designed specifically for veterinary viruses. VIRsiRNAdb is an online curated repository that stores experimentally validated research data of siRNA and short hairpin RNA (shRNA) targeting diverse genes of 42 important human viruses, including influenza virus (Tyagi et al. abstract: Viruses serve as infectious agents for all living entities. There have been various research groups that focus on understanding the viruses in terms of their host-viral relationships, pathogenesis and immune evasion. However, with the current advances in the field of science, now the research field has widened up at the ‘omics’ level. Apparently, generation of viral sequence data has been increasing. There are numerous bioinformatics tools available that not only aid in analysing such sequence data but also aid in deducing useful information that can be exploited in developing preventive and therapeutic measures. This chapter elaborates on bioinformatics tools that are specifically designed for animal viruses as well as other generic tools that can be exploited to study animal viruses. The chapter further provides information on the tools that can be used to study viral epidemiology, phylogenetic analysis, structural modelling of proteins, epitope recognition and open reading frame (ORF) recognition and tools that enable to analyse host-viral interactions, gene prediction in the viral genome, etc. Various databases that organize information on animal and human viruses have also been described. The chapter will converse on overview of the current advances, online and downloadable tools and databases in the field of bioinformatics that will enable the researchers to study animal viruses at gene level. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121192/ doi: 10.1007/978-981-13-9073-9_23 id: cord-017932-vmtjc8ct author: Georgiev, Vassil St. title: Genomic and Postgenomic Research date: 2009 words: 8476.0 sentences: 360.0 pages: flesch: 36.0 cache: ./cache/cord-017932-vmtjc8ct.txt txt: ./txt/cord-017932-vmtjc8ct.txt summary: The family Enterobacteriaceae encompasses a diverse group of bacteria including many of the most important human pathogens (Salmonella, Yersinia, Klebsiella, Shigella), as well as one of the most enduring laboratory research organisms, the nonpathogenic Escherichia coli K12. To this end, NIAID has made significant investments in large-scale sequencing projects, including projects to sequence the complete genomes of many pathogens, such as the bacteria that cause tuberculosis, gonorrhea, chlamydia, and cholera, as well as organisms that are considered agents of bioterrorism. The availability of microbial and human DNA sequences opens up new opportunities and allows scientists to perform functional analyses of genes and proteins in whole genomes and cells, as well as the host''s immune response and an individual''s genetic susceptibility to pathogens. The PFGRC was established in 2001 to provide and distribute to the broader research community a wide range of genomic resources, reagents, data, and technologies for the functional analysis of microbial pathogens and invertebrate vectors of infectious diseases. abstract: The word genomics was first coined by T. Roderick from the Jackson Laboratories in 1986 as the name for the new field of science focused on the analysis and comparison of complete genome sequences of organisms and related high-throughput technologies. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7122628/ doi: 10.1007/978-1-60327-297-1_25 id: cord-348059-wa1gjbck author: Gibbs, Richard A. title: The Human Genome Project changed everything date: 2020-08-07 words: 1732.0 sentences: 91.0 pages: flesch: 54.0 cache: ./cache/cord-348059-wa1gjbck.txt txt: ./txt/cord-348059-wa1gjbck.txt summary: Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. He developed basic methods for DNA and mutation ana lysis and was an early contributor to the Human Genome Project (HGP), leading one of five sites that generated the majority of the sequence. The power of advances in genomics and computers was revealed in the spectacular series of post-HGP projects that were of comparable scale. Some still tally the success of the HGP from lists of new drugs or therapies and argue that world-changing examples in biology, such as the spectacular advances of gene editing tools or the expansion of cancer therapeutics through targeted immunotherapy, are largely based on microbial, cellular and animal studies rather than genomics. abstract: Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. Its success should be measured by how this project transformed the rules of research, the way of practising biological discovery and the ubiquitous digitization of biological science. url: https://www.ncbi.nlm.nih.gov/pubmed/32770171/ doi: 10.1038/s41576-020-0275-3 id: cord-350747-5t5xthk6 author: Gmyl, A. P. title: Diverse Mechanisms of RNA Recombination date: 2005 words: 8187.0 sentences: 463.0 pages: flesch: 41.0 cache: ./cache/cord-350747-5t5xthk6.txt txt: ./txt/cord-350747-5t5xthk6.txt summary: It was believed until recently that the only possible mechanism of RNA recombination is replicative template switching, with synthesis of a complementary strand starting on one viral RNA molecule and being completed on another. An illustrative example of deletions is provided by defective interfering (DI) genomes, which accumulate in a virus population upon high-multiplicity infections and lack a fragment of the sequence coding for viral proteins [5] [6] [7] . A special role in the variation of RNA viruses is played by recombination, the generation of new genomes from two or more parental RNAs. Recombination between viral RNA molecules was observed for the first time as early as in the 1960s in the poliovirus [14, 15] . In other words, it is possible to assume that some of the mechanisms of nonreplicative RNA recombination play an important role in the evolution of not only viral, but also cell genomes [51, 90] . abstract: Recombination is widespread among RNA viruses, but many molecular mechanisms of this phenomenon are still poorly understood. It was believed until recently that the only possible mechanism of RNA recombination is replicative template switching, with synthesis of a complementary strand starting on one viral RNA molecule and being completed on another. The newly synthesized RNA is a primary recombinant molecule in this case. Recent studies have revealed other mechanisms of replicative RNA recombination. In addition, recombination between the genomes of RNA viruses can be nonreplicative, resulting from a joining of preexisting parental molecules. Recombination is a potent tool providing for both the variation and conservation of the genome in RNA viruses. Replicative and nonreplicative mechanisms may contribute differently to each of these evolutionary processes. In the form of trans splicing, nonreplicative recombination of cell RNAs plays an important role in at least some organisms. It is conceivable that RNA recombination continues to contribute to the evolution of DNA genomes. url: https://doi.org/10.1007/s11008-005-0069-x doi: 10.1007/s11008-005-0069-x id: cord-022262-ck2lhojz author: Gromeier, Matthias title: Genetics, Pathogenesis and Evolution of Picornaviruses date: 2007-09-02 words: 28035.0 sentences: 1423.0 pages: flesch: 46.0 cache: ./cache/cord-022262-ck2lhojz.txt txt: ./txt/cord-022262-ck2lhojz.txt summary: The following viruses have been recognized as picornaviruses on the basis of their genome sequences and physico-chemical properties as well as the result of comparative sequence analyses (see the section on Evolution): equine rhinovirus types I and 2, Aichi virus, porcine enterovirus, avian encephalomyelitis virus, infectious flacherie virus of silkworm Clusters of enteroviruses refer to groups of enteroviruses arranged predominantly according to genotypic kinship (Hyypia et al., 1997) . Briefly, when expression vectors ( Figure 12 .6E) consisting of a gag gene (encoding p17-p24; 1161 nt) of human immunodeficiency virus that was fused to the N-terminus of the poliovirus polyprotein (Andino et al., 1994; Mueller and Wimmer, 1998) were analysed after transfection into HeLa cells, the genomes were not only found to be severely impaired in viral replication but they were also genetically unstable (Mueller and Wimmer, 1997) . abstract: The discovery of viruses heralded an exciting new era for research in the medical and biological sciences. It has been realized that the cellular receptor guiding a virus to a target cell cannot be the sole determinant of a virus's pathogenic potential. Comparative analyses of the structures of genomes and their products have placed the picornaviruses into a large “picorna-like” virus family, in which they occupy a prominent place. Most human picornavirus infections are self-limiting, yet the enormously high rate of picornavirus infections in the human population can lead to a significant incidence of disease complications that may be permanently debilitating or even fatal. Picornaviruses employ one of the simplest imaginable genetic systems: they consist of single-stranded RNA that encodes only a single multidomain polypeptide, the polyprotein. The RNA is packaged into a small, rigid, naked, and icosahedral virion whose proteins are unmodified except for a myristate at the N-termini of VP4. The RNA itself does not contain modified bases. The key to ultimately understanding picornaviruses may be to rationalize the huge amount of information about these viruses from the perspective of evolution. It is possible that the replicative apparatus of picornaviruses originated in the precellular world and was subsequently refined in the course of thousands of generations in a slowly evolving environment. Picornaviruses cultivated the art of adaptation, which has allowed them to “jump” into new niches offered in the biological world. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7155501/ doi: 10.1016/b978-012220360-2/50013-1 id: cord-267714-ji88tvsl author: JAKUPCIAK, JOHN P. title: Biological agent detection technologies date: 2009-04-21 words: 3526.0 sentences: 175.0 pages: flesch: 35.0 cache: ./cache/cord-267714-ji88tvsl.txt txt: ./txt/cord-267714-ji88tvsl.txt summary: PCR-based methods have critical limitations, since they depend on a priori knowledge of what sequence to detect in a sample further complicated by recent demonstrations of greater variability in genomic sequence than expected. A platform for genome identification of a specimen from any source must not only be sensitive and specific, but must also detect a variety of pathogens with high accuracy, including modified or previously uncharacterized agents, and this challenge is daunting when identification must be achieved using nucleic acids in a complex sample matrix. The build-out of genome identification DNA sequencing technology in the form of practical instrumentation will be achieved by incorporating the critical requirements for accurate long reads, without dependency for template amplification, capable of manipulating terabytes of data to provide reliable and useful identification of genetic sequences within any unknown sample, whether clinical, environmental, or other type of specimen. abstract: The challenge for first responders, physicians in the emergency room, public health personnel, as well as for food manufacturers, distributors and retailers is accurate and reliable identification of pathogenic agents and their corresponding diseases. This is the weakest point in biological agent detection capability today. There is intense research for new molecular detection technologies that could be used for very accurate detection of pathogens that would be a concern to first responders. These include the need for sensors for multiple applications as varied as understanding the ecology of pathogenic micro‐organisms, forensics, environmental sampling for detect‐to‐treat applications, biological sensors for ‘detect to warn’ in infrastructure protection, responses to reports of ‘suspicious powders’, and customs and borders enforcement, to cite a few examples. The benefits of accurate detection include saving millions of dollars annually by reducing disruption of the workforce and the national economy and improving delivery of correct countermeasures to those who are most in need of the information to provide protective and/or response measures. url: https://doi.org/10.1111/j.1755-0998.2009.02632.x doi: 10.1111/j.1755-0998.2009.02632.x id: cord-004123-1s8kuno2 author: Jaiswal, Arun Kumar title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis date: 2020-01-10 words: 5934.0 sentences: 363.0 pages: flesch: 51.0 cache: ./cache/cord-004123-1s8kuno2.txt txt: ./txt/cord-004123-1s8kuno2.txt summary: title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis pallidum strains isolated from different parts of the world and a diverse range of hosts were comparatively analysed using pan-genomic strategy. pertenue, we found differences in the presence/absence of pathogenicity islands (PAIs) and genomic islands (GIs) on subsp.-based study. In this work, we perform a pan-genome approach to better understand the differences of Treponema pallidum infections in the broad spectrum and how genome plasticity is related to the symptom patterns. Finally, we provide insights into the specific subsets (singletons and the panand core genomes) of 53 genomes of T pallidum strains and correlate these subsets with the plasticity of pathogenicity islands and virulence genes. The subspecies responsible for non-venereal syphilis is Treponema pallidum subsp. Genes which are present in pallidum subspecies pathogenicity islands (PAIs) or genomic islands (GIs) are absent in the subspecies endemicum and pertenue. abstract: BACKGROUND: Spirochetal organisms of the Treponema genus are responsible for causing Treponematoses. Pathogenic treponemes is a Gram-negative, motile, spirochete pathogen that causes syphilis in human. Treponema pallidum subsp. endemicum (TEN) causes endemic syphilis (bejel); T. pallidum subsp. pallidum (TPA) causes venereal syphilis; T. pallidum subsp. pertenue (TPE) causes yaws; and T. pallidum subsp. Ccarateum causes pinta. Out of these four high morbidity diseases, venereal syphilis is mediated by sexual contact; the other three diseases are transmitted by close personal contact. The global distribution of syphilis is alarming and there is an increasing need of proper treatment and preventive measures. Unfortunately, effective measures are limited. RESULTS: Here, the genome sequences of 53 T. pallidum strains isolated from different parts of the world and a diverse range of hosts were comparatively analysed using pan-genomic strategy. Phylogenomic, pan-genomic, core genomic and singleton analysis disclosed the close connection among all strains of the pathogen T. pallidum, its clonal behaviour and showed increases in the sizes of the pan-genome. Based on the genome plasticity analysis of the subsets containing the subspecies T pallidum subsp. pallidum, T. pallidum subsp. endemicum and T. pallidum subsp. pertenue, we found differences in the presence/absence of pathogenicity islands (PAIs) and genomic islands (GIs) on subsp.-based study. CONCLUSIONS: In summary, we identified four pathogenicity islands (PAIs), eight genomic islands (GIs) in subsp. pallidum, whereas subsp. endemicum has three PAIs and seven GIs and subsp. pertenue harbours three PAIs and eight GIs. Concerning the presence of genes in PAIs and GIs, we found some genes related to lipid and amino acid biosynthesis that were only present in the subsp. of T. pallidum, compared to T. pallidum subsp. endemicum and T. pallidum subsp. pertenue. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6953169/ doi: 10.1186/s12864-019-6430-6 id: cord-324811-yjwavea5 author: Kidgell, Claire title: Elucidating genetic diversity with oligonucleotide arrays date: 2005 words: 5561.0 sentences: 268.0 pages: flesch: 37.0 cache: ./cache/cord-324811-yjwavea5.txt txt: ./txt/cord-324811-yjwavea5.txt summary: Oligonucleotide microarrays, predominantly high-density oligonucleotide arrays, have emerged as the principal platforms for performing genome-wide diversity analysis. Since a number of complex issues still remain with high-throughput microarray-based SNP genotyping in humans, in the remainder of this review, we will discuss the application of high-density oligonucleotide arrays to elucidate genetic diversity, with particular focus on studies undertaken with Saccharomyces cerevisiae (Winzeler et al. falciparum (Clark 2002) , the genome-wide analysis facilitated by hybridization of genomic DNA to the A¡ymetrix microarray identi¢ed signi¢cant di¡erences in potential selection pressure across di¡erent gene families and locations within the chromosome (Volkman et al. Although SNPs and deletions can be readily identi¢ed using A¡ymetrix high-density arrays, more complex types of genetic diversity may also be determined using this platform. abstract: DNA microarrays, initially designed to measure gene expression levels, also provide an ideal platform for determining genetic diversity. Oligonucleotide microarrays, predominantly high-density oligonucleotide arrays, have emerged as the principal platforms for performing genome-wide diversity analysis. They have wide-ranging potential applications including comparative genomics, polymorphism discovery and genotyping. The identification of inheritable genetic markers also permits the analysis of quantitative traits, population studies and linkage analysis. In this review, we will discuss the application of oligonucleotide arrays, in particular high-density oligonucleotide arrays for elucidating genetic diversity and highlight some of the directions that the field may take. url: https://www.ncbi.nlm.nih.gov/pubmed/15868417/ doi: 10.1007/s10577-005-1503-6 id: cord-000556-uu1oz2ei author: Kumar, Ranjit title: RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336” date: 2012-01-20 words: 4407.0 sentences: 235.0 pages: flesch: 46.0 cache: ./cache/cord-000556-uu1oz2ei.txt txt: ./txt/cord-000556-uu1oz2ei.txt summary: Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. Therefore, genome structural annotation or the identification and demarcation of boundaries of functional elements in a genome (e.g., genes, non-coding RNAs, proteins, and regulatory elements) are critical elements in infectious disease systems biology. Whole genome transcriptome studies (such as whole genome tiling arrays [13, 14, 15] and high throughput sequencing [16, 17] ) are complementary experimental approaches for bacterial genome annotation and can identify ''''novel'''' genes, gene boundaries, regulatory regions, intergenic regions, and operon structures. We compared the RNA-Seq based transcriptome map with the available genome annotation to identify expressed, novel, and intergenic regions in the genome. The single nucleotide resolution map helped uncover the structure and complexity of this pathogen''s transcriptome and led to the identification of novel, small RNAs and protein coding genes as well as gene co-expression. abstract: Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262788/ doi: 10.1371/journal.pone.0029435 id: cord-001340-kqcx7lrq author: Ladner, Jason T. title: Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing date: 2014-06-17 words: 2512.0 sentences: 121.0 pages: flesch: 40.0 cache: ./cache/cord-001340-kqcx7lrq.txt txt: ./txt/cord-001340-kqcx7lrq.txt summary: Genome sequences play a critical role in our understanding of viral evolution, disease epidemiology, surveillance, diagnosis, and countermeasure development and thus represent valuable resources which must be properly documented and curated to ensure future utility. Here, we outline a set of viral genome quality standards, similar in concept to those proposed for large DNA genomes (4) but focused on the particular challenges of and needs for research on small RNA/ DNA viruses, including characterization of the genomic diversity inherent in all viral samples/populations. Therefore, we have used technology-agnostic criteria to define five standard categories designed to encompass the levels of completeness most often encountered in viral sequencing projects. There is a trend toward requiring a complete genome sequence when a description of a novel virus is being published, and we agree that this is a good goal; however, the amount of time and resources required to complete the last 1 to 2% of a viral genome is often cost and time prohibitive for projects sequencing a large number of samples, and in most cases the very ends of the segments are not essential for proper identification and characterization. abstract: Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4068259/ doi: 10.1128/mbio.01360-14 id: cord-330312-1pjolkql author: Liu, Y.-T. title: Infectious Disease Genomics date: 2017-01-20 words: 5168.0 sentences: 327.0 pages: flesch: 45.0 cache: ./cache/cord-330312-1pjolkql.txt txt: ./txt/cord-330312-1pjolkql.txt summary: One of the important motivations for these efforts is to develop preventative, diagnostic, and therapeutic strategies through the analysis of sequenced microorganisms, parasites, and vectors related to human health. 16, 17 The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002. 30e32 Genome-sequencing projects for other important human disease vectors are in progress. 38 One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. 48 The completed or ongoing genome projects (Table 10 .1) provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. Genome sequence of the human malaria parasite Plasmodium falciparum abstract: The history and development of infectious disease genomics have been closely associated with the Human Genome Project (HGP) during the past 20 years. It has been emphasized since the beginning of the HGP that such effort must not be restricted to the human genome and should include other organisms including mouse, bacteria, yeast, fruit fly, and worm for comparative sequence analyses. A brief history is reviewed in this chapter. As of 2016, more than 7000 completed genome sequencing projects have been reported. One of the important motivations for these efforts is to develop preventative, diagnostic, and therapeutic strategies through the analysis of sequenced microorganisms, parasites, and vectors related to human health. A number of examples are discussed in this chapter. url: https://www.sciencedirect.com/science/article/pii/B978012799942500010X doi: 10.1016/b978-0-12-799942-5.00010-x id: cord-265857-fs6dj3dp author: Liu, Yu-Tsueng title: Infectious Disease Genomics date: 2010-12-24 words: 4341.0 sentences: 233.0 pages: flesch: 45.0 cache: ./cache/cord-265857-fs6dj3dp.txt txt: ./txt/cord-265857-fs6dj3dp.txt summary: The completed or ongoing genome projects will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002 (Gardner et al., 2002; Holt et al., 2002) . Genome sequencing projects for other important human disease vectors are in progress Megy et al., 2009 ). One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. The completed or ongoing genome projects (Table 10 .1) will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. abstract: The history and development of infectious disease genomics are discussed in this chapter. HGP must not be restricted to the human genome and should include model organisms including mouse, bacteria, yeast, fruit fly, and worm. The completed or ongoing genome projects will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. The polysaccharide capsule is important for meningococci to escape from complement-mediated killing. With the completion of the genome sequence of a virulent MenB strain, a “reverse vaccinology” approach was applied for the development of a universal MenB vaccine by Novartis. The indispensable fatty acid synthase (FAS) pathway in bacteria has been regarded as a promising target for the development of antimicrobial agents. Through a systematic screening of 250,000 natural product extracts, a Merck team identified a potent and broad-spectrum antibiotic, platensimycin, which is derived from Streptomyces platensis. Vector Biology Network was formed to achieve three goals (1) to develop basic tools for the stable transformation of anopheline mosquitoes by the year 2000; (2) to engineer a mosquito incapable of carrying the malaria parasite by 2005; and (3) to run controlled experiments to test how to drive the engineered genotype into wild mosquito populations by 2010. The most immediate impact of a completely sequenced pathogen genome is for infectious disease diagnosis. url: https://www.sciencedirect.com/science/article/pii/B9780123848901000108 doi: 10.1016/b978-0-12-384890-1.00010-8 id: cord-018804-wj35q88f author: Lázaro, Ester title: Genetic Variability in RNA Viruses: Consequences in Epidemiology and in the Development of New Stratgies for the Extinction of Infectivity date: 2007 words: 8510.0 sentences: 398.0 pages: flesch: 44.0 cache: ./cache/cord-018804-wj35q88f.txt txt: ./txt/cord-018804-wj35q88f.txt summary: High error prone replication, together with the short replication times and large population sizes typical of RNA viruses, instead of being a handicap for survival provides an extraordinary evolutionary advantage by permitting the generation of a wide reservoir of mutants with different phenotypic properties [7] . However, the fact that DNA organisms, which usually live in constant environments, have evolved corrector activities, whereas RNA viruses have not, suggests that replication with high error rates is a selected character that strongly favours viral adaptation to fast changing conditions. Quasi-species replicating during a long time in a near-constant environment in the absence of large population size fluctuations can present a low rate of fixation of mutations in the consensus sequence, despite the continuous occurrence of mutants that is characteristic of the underlying dynamics of the population. The infection of a new host constitutes a sudden change in the environment in which viral replication takes place, usually with the consequence of a drastic decrease in the average fitness of the virus population, which prevents further transmission. abstract: nan url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7123777/ doi: 10.1007/978-3-540-35306-5_15 id: cord-018437-yjvwa1ot author: Mitchell, Michael title: Taxonomy date: 2013-08-26 words: 9283.0 sentences: 561.0 pages: flesch: 48.0 cache: ./cache/cord-018437-yjvwa1ot.txt txt: ./txt/cord-018437-yjvwa1ot.txt summary: Classifi cation is based on the genomic nucleic acid used by the virus (DNA or RNA), strandedness (single or double stranded), and method of replication. The nucleocapsids of some viruses are surrounded by envelopes composed of lipid bilayers and host-or viral-encoded proteins. The sequence of negative-sense ssRNA is complementary to the coding sequence for translation, so mRNA must be synthesized by RNA polymerase, typically carried within the virion, before translation into viral proteins. Among the families of viruses able to infect humans and other vertebrate hosts, there are many species that target and cause disease in the lung. The nucleocapsid is surrounded by an envelope derived from host-cell membrane and viral envelope proteins, including hepatitis B surface antigen. The genome of human parainfl uenza viruses is ~15 kb in length with an organization and six reading frames (N, P, M, F, HN, L) typical of the Paramyxoviridae (Karron and Collins 2007 ) . abstract: This chapter addresses the classification and taxonomy of viruses with special attention to viruses that show pneumotropic properties. Information provided in this chapter supplements that provided in other chapters in Parts II–V of this volume that discuss individual viral pathogens. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7123310/ doi: 10.1007/978-3-642-40605-8_3 id: cord-264746-gfn312aa author: Muse, Spencer title: GENOMICS AND BIOINFORMATICS date: 2012-03-29 words: 10976.0 sentences: 583.0 pages: flesch: 58.0 cache: ./cache/cord-264746-gfn312aa.txt txt: ./txt/cord-264746-gfn312aa.txt summary: The success of this project (it came in almost 3 years ahead of time and 10% under budget, while at the same time providing more data than originally planned) depended on innovations in a variety of areas: breakthroughs in basic molecular biology to allow manipulation of DNA and other compounds; improved engineering and manufacturing technology to produce equipment for reading the sequences of DNA; advances in robotics and laboratory automation; development of statistical methods to interpret data from sequencing projects; and the creation of specialized computing hardware and software systems to circumvent massive computational barriers that faced genome scientists. Although the list of important biotechnologies changes on an almost daily basis, there are three prominent data types in today''s environment: (1) genome sequences provide the starting point that allows scientists to begin understanding the genetic underpinnings of an organism; (2) measurements of gene expression levels facilitate studies of gene regulation, which, among other things, help us to understand how an organism''s genome interacts with its environment; and (3) genetic polymorphisms are variations from individual to individual within species, and understanding how these variations correlate with phenotypes such as disease susceptibility is a crucial element of modern biomedical research. abstract: This chapter discusses the basic principles of molecular biology regarding genome science and describes the major types of data involved in genome projects, including technologies for collecting them. Genome science is heavily driven by new technological advances that allow for rapid and inexpensive collection of various types of data. The emergence of genomic science has not simply provided a rich set of tools and data for studying molecular biology. It has been the catalyst for an astounding burst of interdisciplinary research, and it has challenged long-established hierarchies found in most institutions of higher learning. The next generation of biologists needs to be as comfortable at a computer workstation as they are at the lab bench. Recognizing this fact, many universities have already reorganized their departments and their curricula to accommodate the demands of genomic science.The chapter discusses practical applications and uses of genomic data. For example, in the foreseeable future, are gene therapies that can repair genetic defects. url: https://api.elsevier.com/content/article/pii/B978012238662650015X doi: 10.1016/b978-0-12-238662-6.50015-x id: cord-014461-2ubh9u8r author: Nelson, Oranmiyan W. title: Genome sequences published outside of Standards in Genomic Sciences, July - October 2012 date: 2012-10-10 words: 4124.0 sentences: 454.0 pages: flesch: 44.0 cache: ./cache/cord-014461-2ubh9u8r.txt txt: ./txt/cord-014461-2ubh9u8r.txt summary: Complete Genome Sequence of Brucella abortus A13334, a New Strain Isolated from the Fetal Gastric Fluid of Dairy Cattle Complete Genome Sequence of Brucella canis Strain HSK A52141, Isolated from the Blood of an Infected Dog Complete Genome Sequence of Streptococcus salivarius PS4, a Strain Isolated from Human Milk Complete Genome Sequences of Probiotic Strains Bifidobacterium animalis subsp. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 1/06-A, Isolated from a Horse in North America Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Characterization and Complete Genome Sequence of Human Coronavirus NL63 Isolated in China Complete Genome Sequence of a Novel Pararetrovirus Isolated from Soybean Complete Genome Sequence of a Polyomavirus Isolated from Horses Complete Genome Sequence of a Novel Porcine Sapelovirus Strain YC2011 Isolated from Piglets with Diarrhea Draft Genome Sequence of Aspergillus oryzae Strain 3.042 abstract: The purpose of this table is to provide the community with a citable record of publications of ongoing genome sequencing projects that have led to a publication in the scientific literature. While our goal is to make the list complete, there is no guarantee that we may have omitted one or more publications appearing in this time frame. Readers and authors who wish to have publications added to subsequent versions of this list are invited to provide the bibliographic data for such references to the SIGS editorial office. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3570808/ doi: 10.4056/sigs.3416907 id: cord-016293-pyb00pt5 author: Newell-McGloughlin, Martina title: The flowering of the age of Biotechnology 1990–2000 date: 2006 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: nan url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120537/ doi: 10.1007/1-4020-5149-2_4 id: cord-007923-j3jpqd7k author: O''Brien, Stephen J. title: Cats date: 2004-12-14 words: 1212.0 sentences: 59.0 pages: flesch: 44.0 cache: ./cache/cord-007923-j3jpqd7k.txt txt: ./txt/cord-007923-j3jpqd7k.txt summary: Wild cats dominate their habitat but require vast expanses to survive, which explains the tragic depredation such that every species of Felidae, except the domestic cat, is considered either endangered or threatened in the wild today by CITES, IUCN Red Book and other monitors of the world''s most endangered species. Domestic cats and dogs enjoy more medical scrutiny than any species except humans. The cat offers the promise of a second carnivore species (in addition to the dog, which shares a common ancestor with cats dating back to approximately 60 million years ago) to improve human genome annotation, as well as to complement the biomedical and genomic discoveries that make the feline genome attractive. The conserved genome of the cat is retained in the other 36 Felidae species, as well as most of the 246 species of the Carnivora order, the only reshuffled exceptions occuring in the dog and bear families. abstract: nan url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7127084/ doi: 10.1016/j.cub.2004.11.017 id: cord-298136-mel9fxw8 author: O''Malley, Maureen A. title: Whole-genome patenting date: 2005-05-10 words: 4106.0 sentences: 189.0 pages: flesch: 44.0 cache: ./cache/cord-298136-mel9fxw8.txt txt: ./txt/cord-298136-mel9fxw8.txt summary: Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. However, further analysis reveals that patent specifications describing whole-genome inventions use arguments that imply that genomes are qualitatively different from individual genes. This standard allows several sub-inventions to be linked together by a common "general inventive concept", but prevents unrelated inventions from succeeding as a single Abstract | Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. If there are any qualitative differences between patents for whole genomes and those for DNA fragments, it seems likely that they will be found in the utility arguments -the most contested feature of recent gene patenting. abstract: Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. When these patents are analysed and compared to those for other biological entities, it becomes clear that genome patents seek to exploit the genome as an information base and are part of a broader shift towards intangible intellectual property in genomics. url: https://www.ncbi.nlm.nih.gov/pubmed/15883589/ doi: 10.1038/nrg1613 id: cord-320005-i30t7cvr author: Pardo, A. title: The Human Genome and Advances in Medicine: Limits and Future Prospects date: 2004-03-31 words: 4919.0 sentences: 211.0 pages: flesch: 49.0 cache: ./cache/cord-320005-i30t7cvr.txt txt: ./txt/cord-320005-i30t7cvr.txt summary: The HGP''s initial objectives were fulfilled 2 years ahead of schedule, and, in addition to compiling a highly accurate sequence of the human genome which has been made freely available and accessible to everyone, the Consortium has developed a set of new technologies and has constructed genetic maps of the genomes of various organisms. Around the same time, the public consortium known as the Human Genome Project was formed, and this organization announced a 15-year plan (from 1990 to 2005) with the following objectives: a) to determine the complete nucleotide sequence of human DNA and identify all the genes in human DNA (estimated to number between 50 000 and 100 000); b) to build physical and genetic maps; c) to analyze the genomes of selected organisms used in research as model systems (eg, the mouse); d) to develop new technologies; and e) to analyze and debate the ethical and legal implications for individuals and for society as a whole. abstract: nan url: https://www.sciencedirect.com/science/article/pii/S1579212906700787 doi: 10.1016/s1579-2129(06)70078-7 id: cord-304607-td0776wj author: Paszkiewicz, Konrad H. title: Omics, Bioinformatics, and Infectious Disease Research date: 2010-12-24 words: 7022.0 sentences: 367.0 pages: flesch: 46.0 cache: ./cache/cord-304607-td0776wj.txt txt: ./txt/cord-304607-td0776wj.txt summary: This chapter discusses the current state of play of bioinformatics related to genomics and transcriptomics, briefs metagenomics that finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms. Bioinformatics plays a key role at several steps in genomics, comparative genomics, and functional genomics: sequence alignment, assembly, identification of single nucleotide polymorphisms (SNP), gene prediction, quantitative analysis of transcription data, etc. The term "metagenomics" was originally used to describe the sequencing of genomes of uncultured microorganisms in order to explore their abilities to produce natural products (Handelsman et al., 1998 , Rondon et al., 2000 and subsequently resulted in novel insights into the ecology and evolution of microorganisms on a scale not imagined possible before (see Cardenas and Tiedje, 2008; Hugenholtz and Tyson, 2008 for an overview). However, metagenomics now finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms from, for example, patient material that could lead to the identification of the cause of disease. abstract: Bioinformatics is basically the study of informatic processes in biotic systems. Actually what constitutes bioinformatics is not entirely clear and arguably varies depending on who tries to define it. This chapter discusses the considerable progress in infectious diseases research that has been made in recent years using various “omics” case studies. Bioinformatics is tasked with making sense of it, mining it, storing it, disseminating it, and ensuring valid biological conclusions can be drawn from it. This chapter discusses the current state of play of bioinformatics related to genomics and transcriptomics, briefs metagenomics that finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms. This chapter explains the various possibilities of pan-genome, transcriptional reshaping and also enormous progress of proteomics study. Bioinformatic algorithms and tools are crucial tools in analyzing the data. The chapter also attempts to provide some details on the various problems and solution in bioinformatics that current-day scientists face while concentrating on second-generation sequencing strategies. url: https://api.elsevier.com/content/article/pii/B9780123848901000182 doi: 10.1016/b978-0-12-384890-1.00018-2 id: cord-352619-s2x53grh author: Payne, Natalie title: Novel Circoviruses Detected in Feces of Sonoran Felids date: 2020-09-15 words: 3263.0 sentences: 177.0 pages: flesch: 45.0 cache: ./cache/cord-352619-s2x53grh.txt txt: ./txt/cord-352619-s2x53grh.txt summary: Genomes from several families of circular Rep-encoding single-stranded DNA viruses (CRESS-DNA viruses) are part of the phylum Cressdnaviricota [22] and have been identified in fecal samples of other mammals, including domestic cats [23, 24] , bobcats, African lions [25] , capybaras [26] , and Tasmanian devils [27] . Here we used a metagenomic approach to identify novel circoviruses in the feces of two species of Sonoran felids, the puma and bobcat; although not endangered, knowledge of viral threats facing these species could help prevent future population decline, as well as indicate potential threats to the endangered ocelot and jaguar. Based on the species-demarcation threshold for circoviruses which is 80% genome-wide identity [28] , both of these belong to a new species which we refer to as Sonfela (derived from Sonoran felid associated) circovirus 1. As the viral genomes were derived from scat samples, the circoviruses could have infected the bobcat prey species or the felids themselves or be environmentally derived. abstract: Sonoran felids are threatened by drought and habitat fragmentation. Vector range expansion and anthropogenic factors such as habitat encroachment and climate change are altering viral evolutionary dynamics and exposure. However, little is known about the diversity of viruses present in these populations. Small felid populations with lower genetic diversity are likely to be most threatened with extinction by emerging diseases, as with other selective pressures, due to having less adaptive potential. We used a metagenomic approach to identify novel circoviruses, which may have a negative impact on the population viability, from confirmed bobcat (Lynx rufus) and puma (Puma concolor) scats collected in Sonora, Mexico. Given some circoviruses are known to cause disease in their hosts, such as porcine and avian circoviruses, we took a non-invasive approach using scat to identify circoviruses in free-roaming bobcats and puma. Three circovirus genomes were determined, and, based on the current species demarcation, they represent two novel species. Phylogenetic analyses reveal that one circovirus species is more closely related to rodent associated circoviruses and the other to bat associated circoviruses, sharing highest genome-wide pairwise identity of approximately 70% and 63%, respectively. At this time, it is unknown whether these scat-derived circoviruses infect felids, their prey, or another organism that might have had contact with the scat in the environment. Further studies should be conducted to elucidate the host of these viruses and assess health impacts in felids. url: https://doi.org/10.3390/v12091027 doi: 10.3390/v12091027 id: cord-281959-g4sjyytr author: Phillippy, Adam M title: Efficient oligonucleotide probe selection for pan-genomic tiling arrays date: 2009-09-16 words: 7392.0 sentences: 360.0 pages: flesch: 54.0 cache: ./cache/cord-281959-g4sjyytr.txt txt: ./txt/cord-281959-g4sjyytr.txt summary: The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. In order to both characterize new strains based on genetic content, and detect polymorphism at a higher resolution in small RNAs (sRNAs) and intergenic sequences, the array was required to cover all pan-genomic sequences with a high density of probes. To see the similarities between the Pan-Tiling and Minimum Hitting Set problems, let the sequence G be a concatenation of all the genomes from a species, and let W = {w 1 , w 2 ,..., w m } be the set of m intervals that results from segmenting G into non-overlapping, end-to-end, length l windows. abstract: BACKGROUND: Array comparative genomic hybridization is a fast and cost-effective method for detecting, genotyping, and comparing the genomic sequence of unknown bacterial isolates. This method, as with all microarray applications, requires adequate coverage of probes targeting the regions of interest. An unbiased tiling of probes across the entire length of the genome is the most flexible design approach. However, such a whole-genome tiling requires that the genome sequence is known in advance. For the accurate analysis of uncharacterized bacteria, an array must query a fully representative set of sequences from the species' pan-genome. Prior microarrays have included only a single strain per array or the conserved sequences of gene families. These arrays omit potentially important genes and sequence variants from the pan-genome. RESULTS: This paper presents a new probe selection algorithm (PanArray) that can tile multiple whole genomes using a minimal number of probes. Unlike arrays built on clustered gene families, PanArray uses an unbiased, probe-centric approach that does not rely on annotations, gene clustering, or multi-alignments. Instead, probes are evenly tiled across all sequences of the pan-genome at a consistent level of coverage. To minimize the required number of probes, probes conserved across multiple strains in the pan-genome are selected first, and additional probes are used only where necessary to span polymorphic regions of the genome. The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. CONCLUSION: PanArray is an oligonucleotide probe selection algorithm for tiling multiple genome sequences using a minimal number of probes. It is capable of fully tiling all genomes of a species on a single microarray chip. These unique pan-genome tiling arrays provide maximum flexibility for the analysis of both known and uncharacterized strains. url: https://doi.org/10.1186/1471-2105-10-293 doi: 10.1186/1471-2105-10-293 id: cord-297669-22fctxk4 author: Proudfoot, Chris title: Genome editing for disease resistance in pigs and chickens date: 2019-06-25 words: 4555.0 sentences: 237.0 pages: flesch: 44.0 cache: ./cache/cord-297669-22fctxk4.txt txt: ./txt/cord-297669-22fctxk4.txt summary: The virus was thought to attach to CD169 to be taken up into the cells; however, genome-edited pigs lacking CD169 were not resistant to PRRSV infection (Prather et al., 2013) . Chicken somatic cell lines have been edited to introduce changes to this gene-conferring resistance to avian leucosis virus in vitro (Lee et al., 2017) . However, as the example for avian influenza shows, host genes play an important role in other steps of the pathogen replication cycle and also provide editing targets for disease resilience or resistance. Genome editing allows integration of the disease-resistance trait into a wider selection of pigs, ensuring genetic variability and maintenance of desirable traits. (D) Resistance genes may be identified in laboratory research but not in highly bred lines, making integration into those productive animals only possible using genome editing. She employs genome editing and genetic selection to generate animals genetically resistant to viral disease. abstract: nan url: https://doi.org/10.1093/af/vfz013 doi: 10.1093/af/vfz013 id: cord-275683-1qj9ri18 author: Roux, Simon title: Metagenomics in Virology date: 2019-06-12 words: 5891.0 sentences: 225.0 pages: flesch: 37.0 cache: ./cache/cord-275683-1qj9ri18.txt txt: ./txt/cord-275683-1qj9ri18.txt summary: Against the background of an extensive viral diversity revealed by metagenomics across many environments, new sequence assembly approaches that reconstruct complete genome sequences from metagenomes have recently revealed surprisingly cosmopolitan viruses in specific ecological niches. However, these techniques can only detect previously known viruses, and often require Box 1 Use of complementary methods to target different types of viruses A number of approaches have been developed to specifically select and survey the genetic material contained by virus particles in a given sample. Virus sequences obtained from "bulk" metagenomes will typically reflect viruses infecting their host cell at the time of sampling, either actively replicating or not, while viromes enables a deeper and more focused exploration of the virus diversity in a specific site or sample. With viral metagenomics being applied to a larger set of samples and environments, and with bioinformatic analyses including genome assembly and interpretation constantly improving, novel groups of dominant and widespread viruses may thus be progressively revealed across many environments. abstract: Metagenomics, i.e., the sequencing and analysis of genomic information extracted directly from clinical or environmental samples, has become a fundamental tool to explore the viral world. Against the background of an extensive viral diversity revealed by metagenomics across many environments, new sequence assembly approaches that reconstruct complete genome sequences from metagenomes have recently revealed surprisingly cosmopolitan viruses in specific ecological niches. Metagenomics is also applied to clinical samples as a non-targeted diagnostic and surveillance tool. By enabling the study of these uncultivated viruses, metagenomics provides invaluable insights into the virus-host interactions, epidemiology, ecology, and evolution of viruses across all ecosystems. url: https://api.elsevier.com/content/article/pii/B9780128096338209576 doi: 10.1016/b978-0-12-809633-8.20957-6 id: cord-015850-ef6svn8f author: Saitou, Naruya title: Eukaryote Genomes date: 2013-08-22 words: 7424.0 sentences: 484.0 pages: flesch: 53.0 cache: ./cache/cord-015850-ef6svn8f.txt txt: ./txt/cord-015850-ef6svn8f.txt summary: General overviews of eukaryote genomes are first discussed, including organelle genomes, introns, and junk DNAs. We then discuss the evolutionary features of eukaryote genomes, such as genome duplication, C-value paradox, and the relationship between genome size and mutation rates. Most of the protein coding genes of melon mitochondrial DNAs are highly similar to those of its congeneric species, which are watermelon and squash whose mitochondrial genome sizes are 119 kb and 125 kb, respectively. There are various genomic features that are specifi c to eukaryotes other than existence of introns and junk DNAs, such as genome duplication, RNA editing, C-value paradox, and the relationship between genome size and mutation rates. The Perigord black truffl e ( Tuber melanosporum ), shown as A i n Fig. 8.9 , has the largest genome size (~125 Mb) among the 88 fungi species whose genome sequences were so far determined, yet the number of genes is only ~7,500 [ 81 ] . abstract: General overviews of eukaryote genomes are first discussed, including organelle genomes, introns, and junk DNAs. We then discuss the evolutionary features of eukaryote genomes, such as genome duplication, C-value paradox, and the relationship between genome size and mutation rates. Genomes of multicellular organisms, plants, fungi, and animals are then briefly discussed. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7119937/ doi: 10.1007/978-1-4471-5304-7_8 id: cord-268795-tjmx6msm author: Sardar, Rahila title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis date: 2020-03-21 words: 2257.0 sentences: 128.0 pages: flesch: 47.0 cache: ./cache/cord-268795-tjmx6msm.txt txt: ./txt/cord-268795-tjmx6msm.txt summary: title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Our analysis shows unique host-miRNAs targeting SARS-CoV2 virus genes. CELLO2GO (7)server was used to infer biological function for each protein of SARS-CoV2 genome with their localization prediction. Assembled SARS-CoV2 genomes sequences in FASTA format from India, USA, China, Italy and Nepal used for coronavirus typing tool analysis. For the phylogenetic analysis, we compared the sequences of 6 SARS-CoV2 isolates from different countries namely, Wuhan, India, Italy, USA and Nepal along with other corona virus species ( Figure 1 ). abstract: The ongoing pandemic of the coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. The phylogeny of the genomes yields some interesting results. Systematic gene level mutational analysis of the genomes has enabled us to identify several unique features of the SARS-CoV2 genome, which includes a unique mutation in the spike surface glycoprotein (A930V (24351C>T)) in the Indian SARS-CoV2, absent in other strains studied here. We have also predicted the impact of the mutations in the spike glycoprotein function and stability, using computational approach. To gain further insights into host responses to viral infection, we predict that antiviral host-miRNAs may be controlling the viral pathogenesis. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Interestingly, the nine miRNAs do not have targets in SARS and MERS genomes. Also, hsa-miR-27b is the only unique miRNA which has a target gene in the Indian SARS-CoV2 genome. We also predicted immune epitopes in the genomes url: https://doi.org/10.1101/2020.03.21.001586 doi: 10.1101/2020.03.21.001586 id: cord-277687-u3q36o3e author: Shean, Ryan C. title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank date: 2019-01-23 words: 4071.0 sentences: 212.0 pages: flesch: 49.0 cache: ./cache/cord-277687-u3q36o3e.txt txt: ./txt/cord-277687-u3q36o3e.txt summary: title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank In order to accept submitted viral genomic data, NCBI GenBank requires 1) viral sequence complete with at least one protein annotation, 2) author/depositor metadata, and 3) viral sequence metadata, such as strain, collection date, collection location, and coverage. VAPiD handles batch submissions of multiple viruses of different types without prior knowledge of the viral species, correctly annotates RNA editing and ribosomal slippage, performs spellchecking on annotations, handles batch or individual submission of metadata, runs with a simple one-line command, and creates annotated viral sequence files for GenBank submission. This first example is the task that the authors originally wrote VAPiD for -annotating large numbers of genomes from different viral species, which mirrors the type of data that many clinical and public health laboratories may encounter. abstract: BACKGROUND: With sequencing technologies becoming cheaper and easier to use, more groups are able to obtain whole genome sequences of viruses of public health and scientific importance. Submission of genomic data to NCBI GenBank is a requirement prior to publication and plays a critical role in making scientific data publicly available. GenBank currently has automatic prokaryotic and eukaryotic genome annotation pipelines but has no viral annotation pipeline beyond influenza virus. Annotation and submission of viral genome sequence is a non-trivial task, especially for groups that do not routinely interact with GenBank for data submissions. RESULTS: We present Viral Annotation Pipeline and iDentification (VAPiD), a portable and lightweight command-line tool for annotation and GenBank deposition of viral genomes. VAPiD supports annotation of nearly all unsegmented viral genomes. The pipeline has been validated on human immunodeficiency virus, human parainfluenza virus 1–4, human metapneumovirus, human coronaviruses (229E/OC43/NL63/HKU1/SARS/MERS), human enteroviruses/rhinoviruses, measles virus, mumps virus, Hepatitis A-E Virus, Chikungunya virus, dengue virus, and West Nile virus, as well the human polyomaviruses BK/JC/MCV, human adenoviruses, and human papillomaviruses. The program can handle individual or batch submissions of different viruses to GenBank and correctly annotates multiple viruses, including those that contain ribosomal slippage or RNA editing without prior knowledge of the virus to be annotated. VAPiD is programmed in Python and is compatible with Windows, Linux, and Mac OS systems. CONCLUSIONS: We have created a portable, lightweight, user-friendly, internet-enabled, open-source, command-line genome annotation and submission package to facilitate virus genome submissions to NCBI GenBank. Instructions for downloading and installing VAPiD can be found at https://github.com/rcs333/VAPiD. url: https://www.ncbi.nlm.nih.gov/pubmed/30674273/ doi: 10.1186/s12859-019-2606-y id: cord-314594-xvc8hvpq author: Singh, Roshan Kumar title: Breeding and biotechnological interventions for trait improvement: status and prospects date: 2020-09-18 words: 9529.0 sentences: 472.0 pages: flesch: 39.0 cache: ./cache/cord-314594-xvc8hvpq.txt txt: ./txt/cord-314594-xvc8hvpq.txt summary: Advances in high-throughput genomics strategies at a whole-genome level, including genetic association mapping, map-based cloning, genomic selection, and speed breeding, are also proven useful in improvising genetic gains for expediting the crop improvement processes. Through genome-wide association study (GWAS), 60 loci significantly associated with agronomic traits such as oil content, seed quality, stress tolerance were identified, which may be proven as a valuable resource for genetic improvement (Lu et al. Marker-assisted backcrossing (MABC) is the introgression of a genomic region (QTL or locus or gene) contributing the desired trait from a donor genotype into a breeding line or elite cultivar without linkage drag through backcrossing after multiple generations. As the name suggests, CRISPR/Cas9 consists of two components: a single-guide Application of functional and comparative genomics in marker-assisted breeding and biotechnological approaches for crop improvement. The candidate gene(s) identified from functional genomic studies can be introduced through genetic engineering or tar-geted modify through genome editing technology in crop species for improved agronomic traits. abstract: MAIN CONCLUSION: Present review describes the molecular tools and strategies deployed in the trait discovery and improvement of major crops. The prospects and challenges associated with these approaches are discussed. ABSTRACT: Crop improvement relies on modulating the genes and genomic regions underlying key traits, either directly or indirectly. Direct approaches include overexpression, RNA interference, genome editing, etc., while breeding majorly constitutes the indirect approach. With the advent of latest tools and technologies, these strategies could hasten the improvement of crop species. Next-generation sequencing, high-throughput genotyping, precision editing, use of space technology for accelerated growth, etc. had provided a new dimension to crop improvement programmes that work towards delivering better varieties to cope up with the challenges. Also, studies have widened from understanding the response of plants to single stress to combined stress, which provides insights into the molecular mechanisms regulating tolerance to more than one stress at a given point of time. Altogether, next-generation genetics and genomics had made tremendous progress in delivering improved varieties; however, the scope still exists to expand its horizon to other species that remain underutilized. In this context, the present review systematically analyses the different genomics approaches that are deployed for trait discovery and improvement in major species that could serve as a roadmap for executing similar strategies in other crop species. The application, pros, and cons, and scope for improvement of each approach have been discussed with examples, and altogether, the review provides comprehensive coverage on the advances in genomics to meet the ever-growing demands for agricultural produce. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s00425-020-03465-4) contains supplementary material, which is available to authorized users. url: https://www.ncbi.nlm.nih.gov/pubmed/32948920/ doi: 10.1007/s00425-020-03465-4 id: cord-016588-f8uvhstb author: Sintchenko, Vitali title: Informatics for Infectious Disease Research and Control date: 2009-10-03 words: 8186.0 sentences: 393.0 pages: flesch: 36.0 cache: ./cache/cord-016588-f8uvhstb.txt txt: ./txt/cord-016588-f8uvhstb.txt summary: The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. "New Age" infectious disease informatics rests on advances in microbial genomics, the sequencing and comparative study of the genomes of pathogens, and proteomics or the identification and characterization of their protein related properties and reconstruction of metabolic and regulatory pathways (Bansal 2005) . The figure was produced using Artemis software (The Wellcome Trust Sanger Institute, UK) 1 Informatics for Infectious Disease Research and Control evidence-based gene calling or translating alignments of the DNA sequence to known proteins; and (3) aligning cDNAs from the same or related species. abstract: The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. Infectious disease informatics can lead to more targeted and effective approaches for the prevention, diagnosis and treatment of infections through a comprehensive review of the genetic repertoire and metabolic profiles of a pathogen. The developments in informatics have been critical in boosting the translational science and in supporting both reductionist and integrative research paradigms. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120928/ doi: 10.1007/978-1-4419-1327-2_1 id: cord-269124-oreg7rnj author: Spyrou, Maria A. title: Ancient pathogen genomics as an emerging tool for infectious disease research date: 2019-04-05 words: 11932.0 sentences: 518.0 pages: flesch: 42.0 cache: ./cache/cord-269124-oreg7rnj.txt txt: ./txt/cord-269124-oreg7rnj.txt summary: Examples of tools that have shown their effectiveness with ancient metagenomic DNA include the widely used Basic Local Alignment Search Tool (BLAST) 68 ; the MEGAN Alignment Tool (MALT) 41 , which involves a taxonomic binning algorithm that can use whole genome databases (such as the National Center for Biotechnical Information (NCBI) Reference Sequence (RefSeq) database 69 ); Metagenomic Phylogenetic Analysis (MetaPhlAn) 70 , which is also integrated into the metagenomic pipeline MetaBIT 71 and uses thousands (or millions) of marker genes for the distinction of specific microbial clades; or Kraken 72 , an alignment free sequence classifier that is based on k-mer matching of a query to a constructed database. Similar limitations can arise when the evolutionary history of a microorganism is vastly affected by recombination, as observed for HBV 44, 53 , although HBV molecular dating was recently attempted using a different genomic data set and suggested that the currently explored diversity of Old and New World pri mate lineages (including all human genotypes) may have emerged within the last 20,000 years 43 . abstract: Over the past decade, a genomics revolution, made possible through the development of high-throughput sequencing, has triggered considerable progress in the study of ancient DNA, enabling complete genomes of past organisms to be reconstructed. A newly established branch of this field, ancient pathogen genomics, affords an in-depth view of microbial evolution by providing a molecular fossil record for a number of human-associated pathogens. Recent accomplishments include the confident identification of causative agents from past pandemics, the discovery of microbial lineages that are now extinct, the extrapolation of past emergence events on a chronological scale and the characterization of long-term evolutionary history of microorganisms that remain relevant to public health today. In this Review, we discuss methodological advancements, persistent challenges and novel revelations gained through the study of ancient pathogen genomes. url: https://doi.org/10.1038/s41576-019-0119-1 doi: 10.1038/s41576-019-0119-1 id: cord-346335-el45v0a5 author: Tan, H.S. title: Fourier spectral density of the coronavirus genome date: 2020-08-11 words: 4646.0 sentences: 224.0 pages: flesch: 56.0 cache: ./cache/cord-346335-el45v0a5.txt txt: ./txt/cord-346335-el45v0a5.txt summary: We uncover an interesting, new scaling law for the coronavirus genome: the complexity of the genome scales linearly with the power-law exponent that characterizes the enveloping curve of the low-frequency domain of the spectral density. An example of a seminal paper in this subject is that of Voss in [2] where the author found that the spectral density of the genome of many different species follows a power law of the form 1/k β in the low-frequency domain, with the exponent β potentially related to the organism''s evolutionary category. We develop a few models to characterize the typical spectrum, and in the process stumble upon a linear scaling law between a measure of the complexity of each genome and the power-law exponent that describes the enveloping curve of the low-frequency domain. abstract: We present an analysis of the coronavirus RNA genome via a study of its Fourier spectral density based on a binary representation of the nucleotide sequence. We find that at low frequencies, the power spectrum presents a small and distinct departure from the behavior expected from an uncorrelated sequence. We provide a couple of simple models to characterize such deviations. Away from a small low-frequency domain, the spectrum presents largely stochastic fluctuations about fixed values which vary inversely with the genome size generally. It exhibits no other peaks apart from those associated with triplet codon usage. We uncover an interesting, new scaling law for the coronavirus genome: the complexity of the genome scales linearly with the power-law exponent that characterizes the enveloping curve of the low-frequency domain of the spectral density. url: https://doi.org/10.1101/2020.06.30.180034 doi: 10.1101/2020.06.30.180034 id: cord-265581-pbv8mjfc author: Tong, Yaojun title: An aurora of natural products-based drug discovery is coming date: 2020-06-06 words: 3077.0 sentences: 155.0 pages: flesch: 46.0 cache: ./cache/cord-265581-pbv8mjfc.txt txt: ./txt/cord-265581-pbv8mjfc.txt summary: With recent scientific advances combining metabolic sciences and technology, multi-omics, big data, combinatorial biosynthesis, synthetic biology, genome editing technology (such as CRISPR), artificial intelligence (AI), and 3D printing, the "high-hanging fruit" is becoming more and more accessible with reduced costs. The incredible rate of development in genome sequencing, modern metabolic engineering, synthetic biology, advanced genome editing, big data, artificial intelligence (AI), and 3D printing together with the growing microbial strain collections enable us to access the previously inaccessible natural products. It starts with genome mining (the analysis of high quality whole genome information), which requires bioinformatics, big data, and even AI; to pathway cloning (refactoring), expression and fermentation, which needs design-buildtest-learn (DBTL) cycle-based metabolic engineering; to the target natural product identification, which requires modern chemical analysis; and to later compound modification and clinical studies, which needs biochemistry and cell biology. abstract: Natural products (NPs), a nature's reservoir possessing enormous structural and functional diversity far beyond the current ability of chemical synthesis, are now proving themselves as most wonderful gifts from mother nature for human beings. Many of them have been used successfully as medicines, as well as the most important sources of drug leads, food additives, and many industry relevant products for millennia. Most notably, more than half of the antibiotics and anti-cancer drugs currently in use are, or derived from, natural products. However, the speed and outputs of NP-based drug discovery has been slowing down dramatically after the fruitful harvest of the “low-hanging fruit” during the golden age of 1950s-1960s. With recent scientific advances combining metabolic sciences and technology, multi-omics, big data, combinatorial biosynthesis, synthetic biology, genome editing technology (such as CRISPR), artificial intelligence (AI), and 3D printing, the “high-hanging fruit” is becoming more and more accessible with reduced costs. We are now more and more confident that a new age of natural products discovery is dawning. url: https://www.ncbi.nlm.nih.gov/pubmed/32537523/ doi: 10.1016/j.synbio.2020.05.003 id: cord-302047-vv5gpldi author: Willemsen, Anouk title: On the stability of sequences inserted into viral genomes date: 2019-11-14 words: 12557.0 sentences: 598.0 pages: flesch: 43.0 cache: ./cache/cord-302047-vv5gpldi.txt txt: ./txt/cord-302047-vv5gpldi.txt summary: Viruses are widely used as vectors for heterologous gene expression in cultured cells or natural hosts, and therefore a large number of viruses with exogenous sequences inserted into their genomes have been engineered. Viruses genera covered in relevant studies Conclusions of this review All viruses • Inserted sequences are often unstable and rapidly lost upon passaging of an engineered virus • The position at which a sequence is integrated in the genome can be important for stability • Sequence stability is not an intrinsic property of genomes because demographic parameters, such as population size and bottleneck size, can have important effects on sequence stability • The multiplicity of cellular infection affects sequence stability, and can in some cases directly affect whether there is selection for deletion variants • Deletions are not the only class of mutations that can reduce the cost of inserted sequences, although they are the most common I: dsDNA abstract: Viruses are widely used as vectors for heterologous gene expression in cultured cells or natural hosts, and therefore a large number of viruses with exogenous sequences inserted into their genomes have been engineered. Many of these engineered viruses are viable and express heterologous proteins at high levels, but the inserted sequences often prove to be unstable over time and are rapidly lost, limiting heterologous protein expression. Although virologists are aware that inserted sequences can be unstable, processes leading to insert instability are rarely considered from an evolutionary perspective. Here, we review experimental work on the stability of inserted sequences over a broad range of viruses, and we present some theoretical considerations concerning insert stability. Different virus genome organizations strongly impact insert stability, and factors such as the position of insertion can have a strong effect. In addition, we argue that insert stability not only depends on the characteristics of a particular genome, but that it will also depend on the host environment and the demography of a virus population. The interplay between all factors affecting stability is complex, which makes it challenging to develop a general model to predict the stability of genomic insertions. We highlight key questions and future directions, finding that insert stability is a surprisingly complex problem and that there is need for mechanism-based, predictive models. Combining theoretical models with experimental tests for stability under varying conditions can lead to improved engineering of viral modified genomes, which is a valuable tool for understanding genome evolution as well as for biotechnological applications, such as gene therapy. url: https://www.ncbi.nlm.nih.gov/pubmed/31741748/ doi: 10.1093/ve/vez045 id: cord-318392-r9bbomvk author: Woo, Patrick CY title: Coronavirus HKU15 in respiratory tract of pigs and first discovery of coronavirus quasispecies in 5′-untranslated region date: 2017-06-21 words: 3771.0 sentences: 213.0 pages: flesch: 56.0 cache: ./cache/cord-318392-r9bbomvk.txt txt: ./txt/cord-318392-r9bbomvk.txt summary: The genomes of two Coronavirus HKU15 strains detected in the nasopharyngeal samples of two different pigs were sequenced following our previous publications 26, 27 with modifications. Divergence times for the Coronavirus HKU15 strains were calculated based on the complete genome sequence data, utilizing the Bayesian Markov chain Monte Carlo method using BEAST 1.8.0 33 with the substitution model GTR (general time-reversible model)+G (gammadistributed rate variation)+I (estimated proportion of invariable sites), a strict molecular clock, and a constant coalescent. In one (S579N) of the two Coronavirus HKU15 genomes that we sequenced in this study, variant sites were observed at four positions; two of them were due to nucleotide substitutions, and the other two were results of indels at mononucleotide polymeric regions (189th and 376th bases). abstract: Coronavirus HKU15 is a deltacoronavirus that was discovered in fecal samples of pigs in Hong Kong in 2012. Over the past three years, Coronavirus HKU15 has been widely detected in pigs in East/Southeast Asia and North America and has been associated with fatal outbreaks. In all such epidemiological studies, the virus was generally only detected in fecal/intestinal samples. In this molecular epidemiology study, we detected Coronavirus HKU15 in 9.6% of the nasopharyngeal samples obtained from 249 pigs in Hong Kong. Samples that tested positive were mostly collected during winter. Complete genome sequencing of the Coronavirus HKU15 in two nasopharyngeal samples revealed quasispecies in one of the samples. Two of the polymorphic sites involved indels, but the other two involved transition substitutions. Phylogenetic analysis showed that the two nasopharyngeal strains in the present study were most closely related to the strains PDCoV/CHJXNI2/2015 from Jiangxi, China, and CH/Sichuan/S27/2012 from Sichuan, China. The outbreak strains in the United States possessed highly similar genome sequences and were clustered monophyletically, whereas the Asian strains were more diverse and paraphyletic. The detection of Coronavirus HKU15 in respiratory tracts of pigs implies that in addition to enteric infections, Coronavirus HKU15 may be able to cause respiratory infections in pigs and that in addition to fecal-oral transmission, the virus could possibly spread through the respiratory route. The presence of the virus in respiratory samples provides an alternative clinical sample to confirm the diagnosis of Coronavirus HKU15 infection. Quasispecies were unprecedentedly observed in the 5′-untranslated region of coronavirus genomes. url: https://www.ncbi.nlm.nih.gov/pubmed/28634353/ doi: 10.1038/emi.2017.37 id: cord-348515-bqqyly23 author: Zhao, Suhui title: Re-emergent Human Adenovirus Genome Type 7d Caused an Acute Respiratory Disease Outbreak in Southern China After a Twenty-one Year Absence date: 2014-12-08 words: 6569.0 sentences: 323.0 pages: flesch: 43.0 cache: ./cache/cord-348515-bqqyly23.txt txt: ./txt/cord-348515-bqqyly23.txt summary: Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Thorough characterization of these pathogens is evidenced by the availability of two genome sequences (JF800905 and JX625134), both of which are further identified as the HAdV-7d genome type in this report, and shown to be nearly identical to this report of an isolate from a 2011 ARD outbreak in Guangdong Province (strain DG01_2011) by comparative genomics and, in particular, in silico REA pattern analysis, as presented in Figure 2 . abstract: Human adenoviruses (HAdVs) are highly contagious pathogens causing acute respiratory disease (ARD), among other illnesses. Of the ARD genotypes, HAdV-7 presents with more severe morbidity and higher mortality than the others. We report the isolation and identification of a genome type HAdV-7d (DG01_2011) from a recent outbreak in Southern China. Genome sequencing, phylogenetic analysis, and restriction endonuclease analysis (REA) comparisons with past pathogens indicate HAdV-7d has re-emerged in Southern China after an absence of twenty-one years. Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. DG01_2011 descends from both a strain circulating in Southwestern China (2010) and a strain from Shaanxi causing a fatality and outbreak (Northwestern China; 2009). Due to the higher morbidity and mortality rates associated with HAdV-7, the surveillance, identification, and characterization of these strains in population-dense China by REA and/or whole genome sequencing are strongly indicated. With these accurate identifications of specific HAdV types and an epidemiological database of regional HAdV pathogens, along with the HAdV genome stability noted across time and space, the development, availability, and deployment of appropriate vaccines are needed. url: https://www.ncbi.nlm.nih.gov/pubmed/25482188/ doi: 10.1038/srep07365 id: cord-000902-ew8orn0z author: Zhao, Xiangyan title: Coevolution between simple sequence repeats (SSRs) and virus genome size date: 2012-08-30 words: 5822.0 sentences: 302.0 pages: flesch: 53.0 cache: ./cache/cord-000902-ew8orn0z.txt txt: ./txt/cord-000902-ew8orn0z.txt summary: The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. While, relative abundance and relative density were examined to make the SSRs comparison parallel among differently sized species genomes; principal component analysis (PCA) was designed to investigate which repeat class(es) made a greater contribution to the variance among virus species as well as the relationships between repeat classes. Therefore, the 257 genome sequences were selected as samples for the analysis of relationship between SSRs distribution and genome size in the level of the whole virus. We surveyed the distribution of different SSR classes in virus genomes to investigate the relationship between repeat classes (mono-, di-, tri-, tetra-, penta-and hexa-) and genome sequence length. Coevolution between simple sequence repeats (SSRs) and virus genome size abstract: BACKGROUND: Relationship between the level of repetitiveness in genomic sequence and genome size has been investigated by making use of complete prokaryotic and eukaryotic genomes, but relevant studies have been rarely made in virus genomes. RESULTS: In this study, a total of 257 viruses were examined, which cover 90% of genera. The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. Certain repeat class is distributed in a certain range of genome sequence length. Mono-, di- and tri- repeats are widely distributed in all virus genomes, tetra- SSRs as a common component consist in genomes which more than 100 kb in size; in the range of genome < 100 kb, genomes containing penta- and hexa- SSRs are not more than 50%. Principal components analysis (PCA) indicated that dinucleotide repeat affects the differences of SSRs most strongly among virus genomes. Results showed that SSRs tend to accumulate in larger virus genomes; and the longer genome sequence, the longer repeat units. CONCLUSIONS: We conducted this research standing on the height of the whole virus. We concluded that genome size is an important factor in affecting the occurrence of SSRs; hosts are also responsible for the variances of SSRs content to a certain degree. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3585866/ doi: 10.1186/1471-2164-13-435 id: cord-265329-bsypo08l author: van Dorp, Lucy title: Emergence of genomic diversity and recurrent mutations in SARS-CoV-2 date: 2020-05-05 words: 4915.0 sentences: 270.0 pages: flesch: 49.0 cache: ./cache/cord-265329-bsypo08l.txt txt: ./txt/cord-265329-bsypo08l.txt summary: Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. The extraordinary availability of genomic data during the COVID-19 pandemic has been made possible thanks to a tremendous effort by hundreds of researchers globally depositing SARS-CoV-2 assemblies (Table S1 ) and the proliferation of close to real time data visualisation and analysis tools including NextStrain (https://nextstrain.org) and CoV-GLUE (http://cov-glue.cvr.gla.ac.uk). In this work we use this data to analyse the genomic diversity that has emerged in the global population of SARS-CoV-2 since the beginning of the COVID-19 pandemic, based on a download of 7710 assemblies. The genomic diversity of the global SARS-CoV-2 population being recapitulated in multiple countries points to extensive worldwide transmission of COVID-19, likely from extremely early on in the pandemic. abstract: SARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January 52,020, and thousands of genomes have been sequenced since this date. This resource allows unprecedented insights into the past demography of SARS-CoV-2 but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. We curated a dataset of 7666 public genome assemblies and analysed the emergence of genomic diversity over time. Our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of 2019, supporting this as the period when SARS-CoV-2 jumped into its human host. Due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. We identify regions of the SARS-CoV-2 genome that have remained largely invariant to date, and others that have already accumulated diversity. By focusing on mutations which have emerged independently multiple times (homoplasies), we identify 198 filtered recurrent mutations in the SARS-CoV-2 genome. Nearly 80% of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of SARS-CoV-2. Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. We additionally provide an interactive user-friendly web-application to query the alignment of the 7666 SARS-CoV-2 genomes. url: https://api.elsevier.com/content/article/pii/S1567134820301829 doi: 10.1016/j.meegid.2020.104351 ==== make-pages.sh questions [ERIC WAS HERE] ==== make-pages.sh search /data-disk/reader-compute/reader-cord/bin/make-pages.sh: line 77: /data-disk/reader-compute/reader-cord/tmp/search.htm: No such file or directory Traceback (most recent call last): File "/data-disk/reader-compute/reader-cord/bin/tsv2htm-search.py", line 51, in with open( TEMPLATE, 'r' ) as handle : htm = handle.read() FileNotFoundError: [Errno 2] No such file or directory: '/data-disk/reader-compute/reader-cord/tmp/search.htm' ==== make-pages.sh topic modeling corpus Zipping study carrel