cord-000012-p56v8wi1 2008 CONCLUSION: Our results provide molecular evidence supporting the origin of ichnoviruses from ascoviruses by lateral transfer of ascoviral genes into ichneumonid wasp genomes, perhaps the first example of symbiogenesis between large DNA viruses and eukaryotic organisms. With respect to both species number and mechanisms that lead to successful parasitism, endoparasitic wasps are known to inject secretions at oviposition, but only a few lineages use viruses or virus-like particles (VLPs) to evade or to suppress host defences. Extending our investigations to proteins encoded by open reading frames of certain ascoviruses and bracoviruses, hosts and bacteria, in the light of recent analyses about the involvement of the replication machinery of virus groups related to ascoviruses in lateral gene transfer [29] , we discuss the robustness and the limits of the molecular evidence supporting an ascovirus origin for ichnovirus lineages. cord-000556-uu1oz2ei 2012 Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. Therefore, genome structural annotation or the identification and demarcation of boundaries of functional elements in a genome (e.g., genes, non-coding RNAs, proteins, and regulatory elements) are critical elements in infectious disease systems biology. Whole genome transcriptome studies (such as whole genome tiling arrays [13, 14, 15] and high throughput sequencing [16, 17] ) are complementary experimental approaches for bacterial genome annotation and can identify ''''novel'''' genes, gene boundaries, regulatory regions, intergenic regions, and operon structures. We compared the RNA-Seq based transcriptome map with the available genome annotation to identify expressed, novel, and intergenic regions in the genome. The single nucleotide resolution map helped uncover the structure and complexity of this pathogen''s transcriptome and led to the identification of novel, small RNAs and protein coding genes as well as gene co-expression. cord-000902-ew8orn0z 2012 The results showed that simple sequence repeats (SSRs) is strongly, positively and significantly correlated with genome size. While, relative abundance and relative density were examined to make the SSRs comparison parallel among differently sized species genomes; principal component analysis (PCA) was designed to investigate which repeat class(es) made a greater contribution to the variance among virus species as well as the relationships between repeat classes. Therefore, the 257 genome sequences were selected as samples for the analysis of relationship between SSRs distribution and genome size in the level of the whole virus. We surveyed the distribution of different SSR classes in virus genomes to investigate the relationship between repeat classes (mono-, di-, tri-, tetra-, penta-and hexa-) and genome sequence length. Coevolution between simple sequence repeats (SSRs) and virus genome size cord-001340-kqcx7lrq 2014 Genome sequences play a critical role in our understanding of viral evolution, disease epidemiology, surveillance, diagnosis, and countermeasure development and thus represent valuable resources which must be properly documented and curated to ensure future utility. Here, we outline a set of viral genome quality standards, similar in concept to those proposed for large DNA genomes (4) but focused on the particular challenges of and needs for research on small RNA/ DNA viruses, including characterization of the genomic diversity inherent in all viral samples/populations. Therefore, we have used technology-agnostic criteria to define five standard categories designed to encompass the levels of completeness most often encountered in viral sequencing projects. There is a trend toward requiring a complete genome sequence when a description of a novel virus is being published, and we agree that this is a good goal; however, the amount of time and resources required to complete the last 1 to 2% of a viral genome is often cost and time prohibitive for projects sequencing a large number of samples, and in most cases the very ends of the segments are not essential for proper identification and characterization. cord-003316-r5te5xob 2018 WGS-based strain identification gives a far superior resolution In principle, WGS can provide highly relevant information for clinical microbiology in near-real-time, from phenotype testing to tracking outbreaks. As an example, genome assembly might appear to be a bottleneck for real-time WGS diagnostics, but is probably rarely required; sufficient characterization of an isolate can be made by analysis of the k-mers in the raw sequence data, which is orders of magnitude faster. These include, among others: the current costs of WGS, which remain far from negligible despite a common belief that sequencing costs have plummeted; a lack of training in, and possible cultural resistance to, bioinformatics among clinical microbiologists; a lack of the necessary computational infrastructure in most hospitals; the inadequacy of existing reference microbial genomics databases necessary for reliable AMR and virulence profiling; and the difficulty of setting up effective, standardized, and accredited bioinformatics protocols. cord-004123-1s8kuno2 2020 title: The pan-genome of Treponema pallidum reveals differences in genome plasticity between subspecies related to venereal and non-venereal syphilis pallidum strains isolated from different parts of the world and a diverse range of hosts were comparatively analysed using pan-genomic strategy. pertenue, we found differences in the presence/absence of pathogenicity islands (PAIs) and genomic islands (GIs) on subsp.-based study. In this work, we perform a pan-genome approach to better understand the differences of Treponema pallidum infections in the broad spectrum and how genome plasticity is related to the symptom patterns. Finally, we provide insights into the specific subsets (singletons and the panand core genomes) of 53 genomes of T pallidum strains and correlate these subsets with the plasticity of pathogenicity islands and virulence genes. The subspecies responsible for non-venereal syphilis is Treponema pallidum subsp. Genes which are present in pallidum subspecies pathogenicity islands (PAIs) or genomic islands (GIs) are absent in the subspecies endemicum and pertenue. cord-005281-wy0zk9p8 2017 In the human genome, this capacity is determined by the portion of chromosomal DNA, which does not contain species-specific protein-encoding sequences and, thus, can basically make a place for novel information that will be modified to reach a new balance. In fact, the scope of the described phenomena is not limited to retroviruses as such, since the ubiquity of retroviral elements in animal genomes, their activity in germline cells [31] , along with the fact that viral replication depends significantly on RNA expression, allow retroviruses to contribute in different ways to the insertion of nonretroviral genes into animal germline cells. Finally, the ability to incorporate parts of the viral genome into the chromosomal DNA of host germline cells can vary strongly among different taxonomic groups of viruses, i.e., orders, families, genera, and even species If insertions of viral sequences remain functionally active in the host cell genome, they can give rise to either proteins that function in a new environment or untranslated RNAs of different sizes. cord-007923-j3jpqd7k 2004 Wild cats dominate their habitat but require vast expanses to survive, which explains the tragic depredation such that every species of Felidae, except the domestic cat, is considered either endangered or threatened in the wild today by CITES, IUCN Red Book and other monitors of the world''s most endangered species. Domestic cats and dogs enjoy more medical scrutiny than any species except humans. The cat offers the promise of a second carnivore species (in addition to the dog, which shares a common ancestor with cats dating back to approximately 60 million years ago) to improve human genome annotation, as well as to complement the biomedical and genomic discoveries that make the feline genome attractive. The conserved genome of the cat is retained in the other 36 Felidae species, as well as most of the 246 species of the Carnivora order, the only reshuffled exceptions occuring in the dog and bear families. cord-012473-p66of6kq 2009 T he primary objective of the Human Genome Project was to produce highquality sequences not just for the human genome but also for those of the chief model organisms: Escherichia coli, yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elegans), fly (Drosophila melanogaster) and mouse (Mus musculus). Free access to the resultant data has prompted much biological research, including development of a map of common human genetic variants (the International HapMap Project) 1 , expression profiling of healthy and diseased cells 2 and in-depth studies of many individual genes. On the basis of this experience, the NHGRI launched two complementary programmes in 2007: an expansion of the human ENCODE project to the whole genome (www.genome.gov/ENCODE) and the model organism ENCODE (modENCODE) project to generate a comprehensive annotation of the functional elements in the C. The research communities that study these two organisms will rapidly make use of the modENCODE results, deploying powerful experimental approaches that are often not possible or practical in mammals, including genetic, genomic, transgenic, biochemical and RNAi assays. cord-014461-2ubh9u8r 2012 Complete Genome Sequence of Brucella abortus A13334, a New Strain Isolated from the Fetal Gastric Fluid of Dairy Cattle Complete Genome Sequence of Brucella canis Strain HSK A52141, Isolated from the Blood of an Infected Dog Complete Genome Sequence of Streptococcus salivarius PS4, a Strain Isolated from Human Milk Complete Genome Sequences of Probiotic Strains Bifidobacterium animalis subsp. Complete Genome Sequence of Corynebacterium pseudotuberculosis Strain 1/06-A, Isolated from a Horse in North America Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Complete Genome Sequence of Bacteriophage BC-611 Specifically Infecting Enterococcus faecalis Strain NP-10011 Characterization and Complete Genome Sequence of Human Coronavirus NL63 Isolated in China Complete Genome Sequence of a Novel Pararetrovirus Isolated from Soybean Complete Genome Sequence of a Polyomavirus Isolated from Horses Complete Genome Sequence of a Novel Porcine Sapelovirus Strain YC2011 Isolated from Piglets with Diarrhea Draft Genome Sequence of Aspergillus oryzae Strain 3.042 cord-015850-ef6svn8f 2013 General overviews of eukaryote genomes are first discussed, including organelle genomes, introns, and junk DNAs. We then discuss the evolutionary features of eukaryote genomes, such as genome duplication, C-value paradox, and the relationship between genome size and mutation rates. Most of the protein coding genes of melon mitochondrial DNAs are highly similar to those of its congeneric species, which are watermelon and squash whose mitochondrial genome sizes are 119 kb and 125 kb, respectively. There are various genomic features that are specifi c to eukaryotes other than existence of introns and junk DNAs, such as genome duplication, RNA editing, C-value paradox, and the relationship between genome size and mutation rates. The Perigord black truffl e ( Tuber melanosporum ), shown as A i n Fig. 8.9 , has the largest genome size (~125 Mb) among the 88 fungi species whose genome sequences were so far determined, yet the number of genes is only ~7,500 [ 81 ] . cord-016293-pyb00pt5 2006 cord-016588-f8uvhstb 2009 The goal of infectious disease informatics is to optimize the clinical and public health management of infectious diseases through improvements in the development and use of antimicrobials, the design of more effective vaccines, the identification of biomarkers for life-threatening infections, a better understanding of host-pathogen interactions, and biosurveillance and clinical decision support. "New Age" infectious disease informatics rests on advances in microbial genomics, the sequencing and comparative study of the genomes of pathogens, and proteomics or the identification and characterization of their protein related properties and reconstruction of metabolic and regulatory pathways (Bansal 2005) . The figure was produced using Artemis software (The Wellcome Trust Sanger Institute, UK) 1 Informatics for Infectious Disease Research and Control evidence-based gene calling or translating alignments of the DNA sequence to known proteins; and (3) aligning cDNAs from the same or related species. cord-016798-tv2ntug6 2019 The chapter further provides information on the tools that can be used to study viral epidemiology, phylogenetic analysis, structural modelling of proteins, epitope recognition and open reading frame (ORF) recognition and tools that enable to analyse host-viral interactions, gene prediction in the viral genome, etc. This chapter will introduce virologists to some of the common as well virus-specific bioinformatics tools that the researches can use to analyse viral sequence data to elucidate the viral dynamics, evolution and preventive therapeutics. Novel virus types comprise of new CDSs that are different than previously known CDSs. There are multiple databases and tools available for analysis of human viruses; however, there are still only a limited number of resources designed specifically for veterinary viruses. VIRsiRNAdb is an online curated repository that stores experimentally validated research data of siRNA and short hairpin RNA (shRNA) targeting diverse genes of 42 important human viruses, including influenza virus (Tyagi et al. cord-017932-vmtjc8ct 2009 The family Enterobacteriaceae encompasses a diverse group of bacteria including many of the most important human pathogens (Salmonella, Yersinia, Klebsiella, Shigella), as well as one of the most enduring laboratory research organisms, the nonpathogenic Escherichia coli K12. To this end, NIAID has made significant investments in large-scale sequencing projects, including projects to sequence the complete genomes of many pathogens, such as the bacteria that cause tuberculosis, gonorrhea, chlamydia, and cholera, as well as organisms that are considered agents of bioterrorism. The availability of microbial and human DNA sequences opens up new opportunities and allows scientists to perform functional analyses of genes and proteins in whole genomes and cells, as well as the host''s immune response and an individual''s genetic susceptibility to pathogens. The PFGRC was established in 2001 to provide and distribute to the broader research community a wide range of genomic resources, reagents, data, and technologies for the functional analysis of microbial pathogens and invertebrate vectors of infectious diseases. cord-018437-yjvwa1ot 2013 Classifi cation is based on the genomic nucleic acid used by the virus (DNA or RNA), strandedness (single or double stranded), and method of replication. The nucleocapsids of some viruses are surrounded by envelopes composed of lipid bilayers and host-or viral-encoded proteins. The sequence of negative-sense ssRNA is complementary to the coding sequence for translation, so mRNA must be synthesized by RNA polymerase, typically carried within the virion, before translation into viral proteins. Among the families of viruses able to infect humans and other vertebrate hosts, there are many species that target and cause disease in the lung. The nucleocapsid is surrounded by an envelope derived from host-cell membrane and viral envelope proteins, including hepatitis B surface antigen. The genome of human parainfl uenza viruses is ~15 kb in length with an organization and six reading frames (N, P, M, F, HN, L) typical of the Paramyxoviridae (Karron and Collins 2007 ) . cord-018804-wj35q88f 2007 High error prone replication, together with the short replication times and large population sizes typical of RNA viruses, instead of being a handicap for survival provides an extraordinary evolutionary advantage by permitting the generation of a wide reservoir of mutants with different phenotypic properties [7] . However, the fact that DNA organisms, which usually live in constant environments, have evolved corrector activities, whereas RNA viruses have not, suggests that replication with high error rates is a selected character that strongly favours viral adaptation to fast changing conditions. Quasi-species replicating during a long time in a near-constant environment in the absence of large population size fluctuations can present a low rate of fixation of mutations in the consensus sequence, despite the continuous occurrence of mutants that is characteristic of the underlying dynamics of the population. The infection of a new host constitutes a sudden change in the environment in which viral replication takes place, usually with the consequence of a drastic decrease in the average fitness of the virus population, which prevents further transmission. cord-022128-r8el8nqm 2019 cord-022262-ck2lhojz 2007 The following viruses have been recognized as picornaviruses on the basis of their genome sequences and physico-chemical properties as well as the result of comparative sequence analyses (see the section on Evolution): equine rhinovirus types I and 2, Aichi virus, porcine enterovirus, avian encephalomyelitis virus, infectious flacherie virus of silkworm Clusters of enteroviruses refer to groups of enteroviruses arranged predominantly according to genotypic kinship (Hyypia et al., 1997) . Briefly, when expression vectors ( Figure 12 .6E) consisting of a gag gene (encoding p17-p24; 1161 nt) of human immunodeficiency virus that was fused to the N-terminus of the poliovirus polyprotein (Andino et al., 1994; Mueller and Wimmer, 1998) were analysed after transfection into HeLa cells, the genomes were not only found to be severely impaired in viral replication but they were also genetically unstable (Mueller and Wimmer, 1997) . cord-264746-gfn312aa 2012 The success of this project (it came in almost 3 years ahead of time and 10% under budget, while at the same time providing more data than originally planned) depended on innovations in a variety of areas: breakthroughs in basic molecular biology to allow manipulation of DNA and other compounds; improved engineering and manufacturing technology to produce equipment for reading the sequences of DNA; advances in robotics and laboratory automation; development of statistical methods to interpret data from sequencing projects; and the creation of specialized computing hardware and software systems to circumvent massive computational barriers that faced genome scientists. Although the list of important biotechnologies changes on an almost daily basis, there are three prominent data types in today''s environment: (1) genome sequences provide the starting point that allows scientists to begin understanding the genetic underpinnings of an organism; (2) measurements of gene expression levels facilitate studies of gene regulation, which, among other things, help us to understand how an organism''s genome interacts with its environment; and (3) genetic polymorphisms are variations from individual to individual within species, and understanding how these variations correlate with phenotypes such as disease susceptibility is a crucial element of modern biomedical research. cord-265329-bsypo08l 2020 Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. The extraordinary availability of genomic data during the COVID-19 pandemic has been made possible thanks to a tremendous effort by hundreds of researchers globally depositing SARS-CoV-2 assemblies (Table S1 ) and the proliferation of close to real time data visualisation and analysis tools including NextStrain (https://nextstrain.org) and CoV-GLUE (http://cov-glue.cvr.gla.ac.uk). In this work we use this data to analyse the genomic diversity that has emerged in the global population of SARS-CoV-2 since the beginning of the COVID-19 pandemic, based on a download of 7710 assemblies. The genomic diversity of the global SARS-CoV-2 population being recapitulated in multiple countries points to extensive worldwide transmission of COVID-19, likely from extremely early on in the pandemic. cord-265581-pbv8mjfc 2020 With recent scientific advances combining metabolic sciences and technology, multi-omics, big data, combinatorial biosynthesis, synthetic biology, genome editing technology (such as CRISPR), artificial intelligence (AI), and 3D printing, the "high-hanging fruit" is becoming more and more accessible with reduced costs. The incredible rate of development in genome sequencing, modern metabolic engineering, synthetic biology, advanced genome editing, big data, artificial intelligence (AI), and 3D printing together with the growing microbial strain collections enable us to access the previously inaccessible natural products. It starts with genome mining (the analysis of high quality whole genome information), which requires bioinformatics, big data, and even AI; to pathway cloning (refactoring), expression and fermentation, which needs design-buildtest-learn (DBTL) cycle-based metabolic engineering; to the target natural product identification, which requires modern chemical analysis; and to later compound modification and clinical studies, which needs biochemistry and cell biology. cord-265857-fs6dj3dp 2010 The completed or ongoing genome projects will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002 (Gardner et al., 2002; Holt et al., 2002) . Genome sequencing projects for other important human disease vectors are in progress Megy et al., 2009 ). One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. The completed or ongoing genome projects (Table 10 .1) will provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. cord-267714-ji88tvsl 2009 PCR-based methods have critical limitations, since they depend on a priori knowledge of what sequence to detect in a sample further complicated by recent demonstrations of greater variability in genomic sequence than expected. A platform for genome identification of a specimen from any source must not only be sensitive and specific, but must also detect a variety of pathogens with high accuracy, including modified or previously uncharacterized agents, and this challenge is daunting when identification must be achieved using nucleic acids in a complex sample matrix. The build-out of genome identification DNA sequencing technology in the form of practical instrumentation will be achieved by incorporating the critical requirements for accurate long reads, without dependency for template amplification, capable of manipulating terabytes of data to provide reliable and useful identification of genetic sequences within any unknown sample, whether clinical, environmental, or other type of specimen. cord-268795-tjmx6msm 2020 title: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Our analysis shows unique host-miRNAs targeting SARS-CoV2 virus genes. CELLO2GO (7)server was used to infer biological function for each protein of SARS-CoV2 genome with their localization prediction. Assembled SARS-CoV2 genomes sequences in FASTA format from India, USA, China, Italy and Nepal used for coronavirus typing tool analysis. For the phylogenetic analysis, we compared the sequences of 6 SARS-CoV2 isolates from different countries namely, Wuhan, India, Italy, USA and Nepal along with other corona virus species ( Figure 1 ). cord-269124-oreg7rnj 2019 Examples of tools that have shown their effectiveness with ancient metagenomic DNA include the widely used Basic Local Alignment Search Tool (BLAST) 68 ; the MEGAN Alignment Tool (MALT) 41 , which involves a taxonomic binning algorithm that can use whole genome databases (such as the National Center for Biotechnical Information (NCBI) Reference Sequence (RefSeq) database 69 ); Metagenomic Phylogenetic Analysis (MetaPhlAn) 70 , which is also integrated into the metagenomic pipeline MetaBIT 71 and uses thousands (or millions) of marker genes for the distinction of specific microbial clades; or Kraken 72 , an alignment free sequence classifier that is based on k-mer matching of a query to a constructed database. Similar limitations can arise when the evolutionary history of a microorganism is vastly affected by recombination, as observed for HBV 44, 53 , although HBV molecular dating was recently attempted using a different genomic data set and suggested that the currently explored diversity of Old and New World pri mate lineages (including all human genotypes) may have emerged within the last 20,000 years 43 . cord-275683-1qj9ri18 2019 Against the background of an extensive viral diversity revealed by metagenomics across many environments, new sequence assembly approaches that reconstruct complete genome sequences from metagenomes have recently revealed surprisingly cosmopolitan viruses in specific ecological niches. However, these techniques can only detect previously known viruses, and often require Box 1 Use of complementary methods to target different types of viruses A number of approaches have been developed to specifically select and survey the genetic material contained by virus particles in a given sample. Virus sequences obtained from "bulk" metagenomes will typically reflect viruses infecting their host cell at the time of sampling, either actively replicating or not, while viromes enables a deeper and more focused exploration of the virus diversity in a specific site or sample. With viral metagenomics being applied to a larger set of samples and environments, and with bioinformatic analyses including genome assembly and interpretation constantly improving, novel groups of dominant and widespread viruses may thus be progressively revealed across many environments. cord-277687-u3q36o3e 2019 title: VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank In order to accept submitted viral genomic data, NCBI GenBank requires 1) viral sequence complete with at least one protein annotation, 2) author/depositor metadata, and 3) viral sequence metadata, such as strain, collection date, collection location, and coverage. VAPiD handles batch submissions of multiple viruses of different types without prior knowledge of the viral species, correctly annotates RNA editing and ribosomal slippage, performs spellchecking on annotations, handles batch or individual submission of metadata, runs with a simple one-line command, and creates annotated viral sequence files for GenBank submission. This first example is the task that the authors originally wrote VAPiD for -annotating large numbers of genomes from different viral species, which mirrors the type of data that many clinical and public health laboratories may encounter. cord-281959-g4sjyytr 2009 The viability of the algorithm is demonstrated by array designs for seven different bacterial pan-genomes and, in particular, the design of a 385,000 probe array that fully tiles the genomes of 20 different Listeria monocytogenes strains with overlapping probes at greater than twofold coverage. In order to both characterize new strains based on genetic content, and detect polymorphism at a higher resolution in small RNAs (sRNAs) and intergenic sequences, the array was required to cover all pan-genomic sequences with a high density of probes. To see the similarities between the Pan-Tiling and Minimum Hitting Set problems, let the sequence G be a concatenation of all the genomes from a species, and let W = {w 1 , w 2 ,..., w m } be the set of m intervals that results from segmenting G into non-overlapping, end-to-end, length l windows. cord-297669-22fctxk4 2019 The virus was thought to attach to CD169 to be taken up into the cells; however, genome-edited pigs lacking CD169 were not resistant to PRRSV infection (Prather et al., 2013) . Chicken somatic cell lines have been edited to introduce changes to this gene-conferring resistance to avian leucosis virus in vitro (Lee et al., 2017) . However, as the example for avian influenza shows, host genes play an important role in other steps of the pathogen replication cycle and also provide editing targets for disease resilience or resistance. Genome editing allows integration of the disease-resistance trait into a wider selection of pigs, ensuring genetic variability and maintenance of desirable traits. (D) Resistance genes may be identified in laboratory research but not in highly bred lines, making integration into those productive animals only possible using genome editing. She employs genome editing and genetic selection to generate animals genetically resistant to viral disease. cord-298136-mel9fxw8 2005 Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. However, further analysis reveals that patent specifications describing whole-genome inventions use arguments that imply that genomes are qualitatively different from individual genes. This standard allows several sub-inventions to be linked together by a common "general inventive concept", but prevents unrelated inventions from succeeding as a single Abstract | Gene patenting is now a familiar commercial practice, but there is little awareness that several patents claim ownership of the complete genome sequence of a prokaryote or virus. If there are any qualitative differences between patents for whole genomes and those for DNA fragments, it seems likely that they will be found in the utility arguments -the most contested feature of recent gene patenting. cord-301709-kvyes2lz 2006 The database will contain high-quality curated data: sequence annotations from published whole and partial genomes; relevant experimental data; metabolic pathway data; taxonomic data; literature citations; and a suite of visualization and analysis tools. The results of these programs and searches assembled by the annotation pipeline are used to propose biological features that are also stored in the curation database that uses the Genomics Unified Schema (GUS). For the purposes of defining minimal, non-redundant set of genes characteristic of the category, one genome (usually the best-known or best-characterized) is identified as the "reference genome"; the remaining members of the class are called "associated genomes." For example, the Tor2 and Urbani isolates were the first two SARS coronavirus genomes to be sequenced and therefore were named as reference genomes. This allows high-value, manually curated information from the corresponding reference genes to be automatically linked to the associated genes, provided minimal similarity criteria based on automated sequence analysis are satisfied. cord-302047-vv5gpldi 2019 Viruses are widely used as vectors for heterologous gene expression in cultured cells or natural hosts, and therefore a large number of viruses with exogenous sequences inserted into their genomes have been engineered. Viruses genera covered in relevant studies Conclusions of this review All viruses • Inserted sequences are often unstable and rapidly lost upon passaging of an engineered virus • The position at which a sequence is integrated in the genome can be important for stability • Sequence stability is not an intrinsic property of genomes because demographic parameters, such as population size and bottleneck size, can have important effects on sequence stability • The multiplicity of cellular infection affects sequence stability, and can in some cases directly affect whether there is selection for deletion variants • Deletions are not the only class of mutations that can reduce the cost of inserted sequences, although they are the most common I: dsDNA cord-304498-ty41xob0 2011 Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. Genetic inactivation of exoN activity in engineered SArS-Cov and MHv genomes by alanine substitution at conserved De-D-D active site residues results in viable mutants that demonstrate 15-to 20-fold increases in mutation rates, up to 18 times greater than those tolerated for fidelity mutants of other rNA viruses. The high mutation rates of RNA viruses also render them particularly susceptible to repeated genetic bottleneck events during replication, transmission between hosts or spread within a host, resulting in progressive deviation from the consensus sequence associated with decreased viral fitness and sometimes extinction. cord-304607-td0776wj 2010 This chapter discusses the current state of play of bioinformatics related to genomics and transcriptomics, briefs metagenomics that finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms. Bioinformatics plays a key role at several steps in genomics, comparative genomics, and functional genomics: sequence alignment, assembly, identification of single nucleotide polymorphisms (SNP), gene prediction, quantitative analysis of transcription data, etc. The term "metagenomics" was originally used to describe the sequencing of genomes of uncultured microorganisms in order to explore their abilities to produce natural products (Handelsman et al., 1998 , Rondon et al., 2000 and subsequently resulted in novel insights into the ecology and evolution of microorganisms on a scale not imagined possible before (see Cardenas and Tiedje, 2008; Hugenholtz and Tyson, 2008 for an overview). However, metagenomics now finds use in infectious disease research as well as the random sequencing of genomes from a variety of organisms from, for example, patient material that could lead to the identification of the cause of disease. cord-310406-5pvln91x 2010 RESULTS: We have applied object-oriented technology to develop a downloadable visualization tool, Genome3D, for integrating and displaying epigenomic data within a prescribed three-dimensional physical model of the human genome. In addition, in spite of the many recent efforts to measure and model the genome structure at various resolutions and detail [3] [4] [5] [6] [7] [8] [9] [10] , little work has focused on combining these models into a plausible aggregate, or has taken advantage of the large amount of genomic and epigenomic data available from new high-throughput approaches. The viewer is designed to display data from multiple scales and uses a hierarchical model of the relative positions of all nucleotide atoms in the cell nucleus, i.e., the complete physical genome. An integrated physical genome model can show the interplay between histone modifications and other genomic data, such as SNPs, DNA methylation, the structure of gene, promoter and transcription machinery, etc. In addition to epigenomic data, the physical genome model also provides a platform to visualize highthroughput gene expression data and its interplay with global binding information of transcription factors. cord-314594-xvc8hvpq 2020 Advances in high-throughput genomics strategies at a whole-genome level, including genetic association mapping, map-based cloning, genomic selection, and speed breeding, are also proven useful in improvising genetic gains for expediting the crop improvement processes. Through genome-wide association study (GWAS), 60 loci significantly associated with agronomic traits such as oil content, seed quality, stress tolerance were identified, which may be proven as a valuable resource for genetic improvement (Lu et al. Marker-assisted backcrossing (MABC) is the introgression of a genomic region (QTL or locus or gene) contributing the desired trait from a donor genotype into a breeding line or elite cultivar without linkage drag through backcrossing after multiple generations. As the name suggests, CRISPR/Cas9 consists of two components: a single-guide Application of functional and comparative genomics in marker-assisted breeding and biotechnological approaches for crop improvement. The candidate gene(s) identified from functional genomic studies can be introduced through genetic engineering or tar-geted modify through genome editing technology in crop species for improved agronomic traits. cord-316033-xg8eb2nm 2020 suum transcripts (Jex et al., 2011; Wang et al., 2017) to the human Ascaris germline assembly to annotate the genome, identifying and classifying 17,902 protein-coding genes ( Table 1 , Supplementary file 1). As this reference-based assembly exhibits the best assembly attributes, including high continuity with a large N50, low gaps and unplaced sequences, and high-quality protein-coding genes (see Table 1 ), we suggest that this version should be used as a reference germline genome for a human Ascaris spp. We next took advantage of the abundant reads from the mitochondrial genome in our sequencing data (on average 7690X coverage, see Supplementary file 1) to perform de novo assembly of 68 complete human Ascaris spp. Furthermore, there were no significant associations between mitochondrial sequence variations and other factors (e.g. village, household, time of worm collection, host) based on PERMANOVA (see methods and Table 2 ) after translating the phylogenetic tree into a distance matrix, suggesting not only a lack of differentiation into distinct species but also a potentially large interbreeding population of worms being transmitted between individuals and across villages. cord-318392-r9bbomvk 2017 The genomes of two Coronavirus HKU15 strains detected in the nasopharyngeal samples of two different pigs were sequenced following our previous publications 26, 27 with modifications. Divergence times for the Coronavirus HKU15 strains were calculated based on the complete genome sequence data, utilizing the Bayesian Markov chain Monte Carlo method using BEAST 1.8.0 33 with the substitution model GTR (general time-reversible model)+G (gammadistributed rate variation)+I (estimated proportion of invariable sites), a strict molecular clock, and a constant coalescent. In one (S579N) of the two Coronavirus HKU15 genomes that we sequenced in this study, variant sites were observed at four positions; two of them were due to nucleotide substitutions, and the other two were results of indels at mononucleotide polymeric regions (189th and 376th bases). cord-320005-i30t7cvr 2004 The HGP''s initial objectives were fulfilled 2 years ahead of schedule, and, in addition to compiling a highly accurate sequence of the human genome which has been made freely available and accessible to everyone, the Consortium has developed a set of new technologies and has constructed genetic maps of the genomes of various organisms. Around the same time, the public consortium known as the Human Genome Project was formed, and this organization announced a 15-year plan (from 1990 to 2005) with the following objectives: a) to determine the complete nucleotide sequence of human DNA and identify all the genes in human DNA (estimated to number between 50 000 and 100 000); b) to build physical and genetic maps; c) to analyze the genomes of selected organisms used in research as model systems (eg, the mouse); d) to develop new technologies; and e) to analyze and debate the ethical and legal implications for individuals and for society as a whole. cord-324811-yjwavea5 2005 Oligonucleotide microarrays, predominantly high-density oligonucleotide arrays, have emerged as the principal platforms for performing genome-wide diversity analysis. Since a number of complex issues still remain with high-throughput microarray-based SNP genotyping in humans, in the remainder of this review, we will discuss the application of high-density oligonucleotide arrays to elucidate genetic diversity, with particular focus on studies undertaken with Saccharomyces cerevisiae (Winzeler et al. falciparum (Clark 2002) , the genome-wide analysis facilitated by hybridization of genomic DNA to the A¡ymetrix microarray identi¢ed signi¢cant di¡erences in potential selection pressure across di¡erent gene families and locations within the chromosome (Volkman et al. Although SNPs and deletions can be readily identi¢ed using A¡ymetrix high-density arrays, more complex types of genetic diversity may also be determined using this platform. cord-330312-1pjolkql 2017 One of the important motivations for these efforts is to develop preventative, diagnostic, and therapeutic strategies through the analysis of sequenced microorganisms, parasites, and vectors related to human health. 16, 17 The genomes of human malaria parasite Plasmodium falciparum and its major mosquito vector Anopheles gambiae were published in 2002. 30e32 Genome-sequencing projects for other important human disease vectors are in progress. 38 One of the similar efforts for human pathogens is the NIH Influenza Genome Sequencing Project. 48 The completed or ongoing genome projects (Table 10 .1) provide enormous opportunities for the discovery of novel vaccines and drug targets against human pathogens as well as the improvement of diagnosis and discovery of infectious agents and the development of new strategies for invertebrate vector control. Genome sequence of the human malaria parasite Plasmodium falciparum cord-334394-qgyzk7th 2020 To address the ongoing pandemic caused by Severe Acute Respiratory Syndrome Coronavirus 2 and expand the known sequence diversity of viruses, we aligned pangenomes for coronaviruses (CoV) and other viral families to 5.6 petabases of public sequencing data from 3.8 million biologically diverse samples. To expand the known repertoire of viruses and catalyse global virus discovery, in particular for Coronaviridae (CoV) family, we developed the Serratus cloud computing architecture for ultra-high throughput sequence alignment. We aligned 3,837,755 public RNA-seq, meta-genome, meta-virome and meta-transcriptome datasets (termed a sequencing run [5] ) against a collection of viral family pangenomes comprising all GenBank CoV records clustered at 99% identity plus all non-retroviral RefSeq records for vertebrate viruses (see Methods and Extended Table 1 ). We performed de novo assembly on 52,772 runs potentially containing CoV sequencing reads by combining 37,131 SRA accessions identified by the Serratus search with 18,584 identified by an ongoing cataloguing initiative of the SRA called STAT [5] . cord-340423-f8ab7413 2016 We then discuss evidence that at least some RNA viruses have a replication fidelity that is poised to maximize genome sequence space without incurring catastrophic lethal mutations and describe how this can be exploited to control viral infections. The error-prone nature of polymerase activity, coupled with the absence of a proofreading mechanism, is the key reason why RNA virus genomes acquire mutations and exist as a swarm of genetic variants. The mutation rate of the viral polymerase, coupled with the replication mode that the virus employs (and extrinsic factors, described in the following text) will determine the extent of genetic variability of viruses released from an infected cell. Thus, it is possible that the high mutation rates of RNA viruses are simply a consequence of polymerases that are under selective pressure to replicate genomes very rapidly to ensure efficient viral infection [79] [80] [81] . cord-346335-el45v0a5 2020 We uncover an interesting, new scaling law for the coronavirus genome: the complexity of the genome scales linearly with the power-law exponent that characterizes the enveloping curve of the low-frequency domain of the spectral density. An example of a seminal paper in this subject is that of Voss in [2] where the author found that the spectral density of the genome of many different species follows a power law of the form 1/k β in the low-frequency domain, with the exponent β potentially related to the organism''s evolutionary category. We develop a few models to characterize the typical spectrum, and in the process stumble upon a linear scaling law between a measure of the complexity of each genome and the power-law exponent that describes the enveloping curve of the low-frequency domain. cord-348059-wa1gjbck 2020 Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. Thirty years on from the launch of the Human Genome Project, Richard Gibbs reflects on the promises that this voyage of discovery bore. He developed basic methods for DNA and mutation ana lysis and was an early contributor to the Human Genome Project (HGP), leading one of five sites that generated the majority of the sequence. The power of advances in genomics and computers was revealed in the spectacular series of post-HGP projects that were of comparable scale. Some still tally the success of the HGP from lists of new drugs or therapies and argue that world-changing examples in biology, such as the spectacular advances of gene editing tools or the expansion of cancer therapeutics through targeted immunotherapy, are largely based on microbial, cellular and animal studies rather than genomics. cord-348515-bqqyly23 2014 Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Recombination analysis reveals this genome differs from the 1950s-era prototype and vaccine strains by a lateral gene transfer, substituting the coding region for the L1 52/55 kDa DNA packaging protein from HAdV-16. Thorough characterization of these pathogens is evidenced by the availability of two genome sequences (JF800905 and JX625134), both of which are further identified as the HAdV-7d genome type in this report, and shown to be nearly identical to this report of an isolate from a 2011 ARD outbreak in Guangdong Province (strain DG01_2011) by comparative genomics and, in particular, in silico REA pattern analysis, as presented in Figure 2 . cord-350747-5t5xthk6 2005 It was believed until recently that the only possible mechanism of RNA recombination is replicative template switching, with synthesis of a complementary strand starting on one viral RNA molecule and being completed on another. An illustrative example of deletions is provided by defective interfering (DI) genomes, which accumulate in a virus population upon high-multiplicity infections and lack a fragment of the sequence coding for viral proteins [5] [6] [7] . A special role in the variation of RNA viruses is played by recombination, the generation of new genomes from two or more parental RNAs. Recombination between viral RNA molecules was observed for the first time as early as in the 1960s in the poliovirus [14, 15] . In other words, it is possible to assume that some of the mechanisms of nonreplicative RNA recombination play an important role in the evolution of not only viral, but also cell genomes [51, 90] . cord-352619-s2x53grh 2020 Genomes from several families of circular Rep-encoding single-stranded DNA viruses (CRESS-DNA viruses) are part of the phylum Cressdnaviricota [22] and have been identified in fecal samples of other mammals, including domestic cats [23, 24] , bobcats, African lions [25] , capybaras [26] , and Tasmanian devils [27] . Here we used a metagenomic approach to identify novel circoviruses in the feces of two species of Sonoran felids, the puma and bobcat; although not endangered, knowledge of viral threats facing these species could help prevent future population decline, as well as indicate potential threats to the endangered ocelot and jaguar. Based on the species-demarcation threshold for circoviruses which is 80% genome-wide identity [28] , both of these belong to a new species which we refer to as Sonfela (derived from Sonoran felid associated) circovirus 1. As the viral genomes were derived from scat samples, the circoviruses could have infected the bobcat prey species or the felids themselves or be environmentally derived.