key: cord-0845422-zaimorg0 authors: Ratra, Ruchi; Lal, Sunil K. title: Functional genomics as a tool in virus research date: 2008-06-01 journal: Indian Journal of Microbiology DOI: 10.1007/s12088-008-0032-3 sha: 43b8f5f88b4c780d2701f2343adc21b84c3f7108 doc_id: 845422 cord_uid: zaimorg0 Genomics is the study of an organism’s entire genome. It started out as a great scientific endeavor in the 1990s which aimed to sequence the complete genomes of certain biological species. However viruses are not new to this field as complete viral genomes have routinely been sequenced since the past thirty years. The ‘genomic era’ has been said to have revolutionized biology. This knowledge of full genomes has created the field of functional genomics in today’s post-genomic era, which, is in most part concerned with the studies on the expression of the organism’s genome under different conditions. This article is an attempt to introduce its readers to the application of functional genomics to address and answer several complex biological issues in virus research. The genomics era has revolutionized the biological sciences and has heralded the emergence of new 'omics' methodologies such as transcriptomics (study of the gene expression and expression levels of mRNAs at a given time and condition), proteomics (study of the entire protein content of a cell/tissue under various conditions, their structure and functions), metabolomics (study of the metabolite profi le of different cellular processes), phosphoproteomics (a branch of proteomics that characterizes proteins that are phosphorylated), interactomics/system biology (a science that unifi es transcriptomics, proteomics and metabolomics to look at the organism as a whole) and so on. For years the scientists have followed the reductionist approach, where a single gene and its functions and activities are studied in isolation. Although this branch of experimental biology is indispensable, biological processes are now being looked at as a whole. Functional Genomics has been defi ned by Hieter and Boguski as 'the development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making use of the information and reagents provided by genome sequencing and mapping [1] . Functional genomics makes use of the vast wealth of data generated by the various genome projects (which determine the complete genome sequence of an organism) to describe the gene (and its encoded proteins), functions and its interactions. Functional genomics describes a biological process by taking into account the mutations and R. Ratra · S. K. Lal ( ) Virology Group, International Centre for Genetic Engineering & Biotechnology, New Delhi, India e-mail: sunillal@icgeb.res.in polymorphism inherent in the genome that determine its functions as well as the dynamic aspects of gene expression (transcriptomics) and protein expression (proteomics) such as gene transcription, translation, and protein-protein interactions. There is a huge surge in the number of sequences deposited in the sequence databases and it continues to grow in volume tremendously. Viruses were in fact the fi rst organisms for which the complete genome sequences were made available. In 1976, Walter Fiers was the fi rst to establish the complete nucleotide sequence of a viral RNAgenome (bacteriophage MS2) [2] . The fi rst DNA-genome project to be completed was the Phage Φ-X174, by Fred Sanger in 1977 [3] . Several databases have been developed to access the various viral genomes sequenced [4, 5] . A review by Kellam and Alba [6] discusses the impact of bioinformatics on virology. One would get an idea of the burgeoning number of complete viral genome sequences available if you see that the NCBI Entrez Genomes currently contains 2798 Reference Sequences for 1843 viral genomes and 36 Reference Sequences for viroids as of 22nd June, 2007 (http://www.ncbi.nlm.nih.gov/genomes/ VIRUSES/viruses.html) [7] . These complete genome sequences of the viruses serve as the starting point for functional genomics. Sequence analysis and comparisons are the basis of molecular phylogeny and provide insights into virus evolution. Such studies in herpesviruses have led to the ideas of co-speciation and the pirating of cellular genes in this viral family [8, 9] . Functional genomics has been used to construct phylogenetic trees by defi ning the functional gene content of the organism. Bioinformatics applied to the complete genome sequences has been used to identify families of conserved genes [10] or protein families based on the presence of conserved sequence motifs. The methods used in functional genomics are mostly highthroughput techniques that can be applied to genome-wide scale and are used to generate large datasets to aid understanding of gene function. Some well standardized and widely used approaches are DNA microarrays [11] [12] [13] [14] and SAGE (Serial analysis of gene expression) [15] for quantifying the mRNA populations and two-dimensional gel electrophoresis and mass spectrometry [16] or high-throughput yeast two-hybrid screens [17] for protein. To date, the most successful functional genomics tool is global gene expression analysis using DNA microarrays [18] . DNA arrays consist of synthetic oligonucleotides or PCR probes immobilized onto solid surfaces such as glass or a nitrocellulose or nylon membrane. High density arrays, which have thou-sands of individual probes per unit area, are referred to as microarrays. Data is obtained by hybridizations between the probes and labeled sequences in the applied samples that are revealed by scanning or imaging the array surface [19] . Bioinformatics or computational biology is a crucial tool used to manage, analyze and integrate the huge amounts of experimental data generated by these techniques [20] . Other techniques and methodologies for high-throughput genome-wide analysis that are now becoming popular are chromatin immunoprecipitation arrays/chip-on-chip [21] , tiling arrays [22] , high content fl uorescent microscopy [23] , and RNA interference screens [24] . A detailed discussion on the techniques is beyond the scope of this article. Functional genomics approach has been routinely applied to study viral replication, gene expression, evolution, and diagnostics and so on. Categorized below are few of the areas of research in virus biology to which the fi eld of functional genomics has been successfully applied. The list of included research areas or the examples sited are not exhaustive and only serve as an introduction to the fi eld, outline its vastness and highlight its limitless potential in solving seemingly complex problems in virology. DNA microarrays, proteomics and bioinformatic analysis are routinely used to analyze changes in host and viral gene and protein expression that occur in a virus infected cell [25] . Whole Functional genomics approach has also been used to study the cellular innate immune response to virus. By transfecting HeLa cells with wild-type or heat-inactivated infl uenza virus and then monitoring cellular gene expression demonstrated that infl uenza virus modulates cellular events by both replication-dependent and independent pathways [34] . It has also been confi rmed that the NS1 gene product of infl uenza virus functions as an interferon antagonist as infection with a virus lacking this gene results in a signifi cant increase in the expression of genes involved in interferon signaling [38, 39] . Such studies have provided new insights into infl uenza virus pathogenesis. Similar studies when carried out for adenovirus infection of HeLa cells showed that the virus modulates expression of a limited set of cellular genes. The E1A protein of the virus regulating E2F-dependent transcription was found to be a major pathway for modulation of cellular gene expression. Among other genes found to be up -or down-regulated, several cytokines involved in the innate immune response were found to be down-regulated [40] . A functional genomic analysis of herpes simplex virus type 1 infection in mouse embryo fi broblasts demonstrated that the expression and function of the viral gene product ICP34.5 at early times post-infection has a pivotal role in the ability of HSV-1 to usurp host metabolic and biosynthetic processes for virus propagation and by simultaneously evading the innate immune response by dephosphorylation of eIF2α it helps maintain an environment for successful viral replication [41] . Paul Ahlquist's group at the Howard Hughes Medical Institute have applied the functional genomics approach to address fundamental questions in virus replication, gene expression and virus-cell interactions particularly to the positive-strand RNA viruses (including hepatitis C virus, SARS coronavirus and Brome mosaic virus) and DNA tumor viruses (hepatitis B virus). To globally identify the host factors that function in viral replication, transcription and translation, BMV RNA replication was assayed in each strain of an ordered genome-wide set of yeast single gene deletions [42, 43] . Approximately 4500 yeast deletion strains were screened. This functional genomics approach revealed nearly 100 genes whose absence inhibited or stimulated BMV RNA replication and/or gene expression by 3 -to >25-fold. Several genes were identifi ed that were involved in RNA, protein, or membrane modifi cation pathways, amongst which many were known players in BMV replication. Another study from the same lab identifi ed the molecular mechanisms by which EBV-associated epithelial cancers are maintained [44] . The expression of all human genes in 31 nasopharyngeal carcinoma (NPC) tissue samples and 10 normal nasopharyngeal tissues were studied. Global gene expression profi les clearly distinguished tumors from normal healthy epithelium. The expression of viral genes: EBNA1, EBNA2, EBNA3A, EBNA3B, LMP1, and LMP2A were found to be correlated among themselves and inversely correlated with the expression of a large subset of host genes, such as multiple MHC class I HLA genes involved in regulating immune response via antigen presentation. This association between EBV gene expres-sion and inhibition of MHC class I HLA expression might facilitate immune evasion by tumor cells, and/or such tumor cells sustain higher levels of EBV. Further, the functional genomics approach established that key proteins involved in apoptosis, cell cycle checkpoints, and metastasis were deregulated and their expression was closely correlated with the levels of EBV gene expression in NPC. Functional Genomics on potato virus A was used to map sites essential for virus propagation [45] . Using transposition-based in vitro insertional mutagenesis, a viral genomic 15-bp insertion mutant library was generated. The proficiency of 1125 such mutants to propagate in tobacco protoplasts was analyzed simultaneously mapping the genomic insertion sites. Over 300 sites critical for virus propagation were thus identifi ed, and many of them were located in positions previously not assigned to any viral functions. The methodology described is applicable to a detailed functional analysis of any viral nucleic acid cloned as DNA and can be used to address many different processes during viral infection cycles. Not only the viral genome but also the host genome is essential for the viral life cycle. The viral and host factors that determine whether a viral infection would result in viral replication and propagation or its elimination by the host immune response are still largely unknown. Besides targeting specifi c viral genes, the host genes essential for the viral cycle may serve as antiviral targets based on the fact that viruses employ several strategies to alter host gene expression. Many experimental studies have gone into establishing the different cellular pathways modulated by the virus, and whether a common theme exists for all viruses or each virus behaves differently? An understanding of how a cell responds to a viral infection and its fi nal outcome and the identifi cation of potential targets by global, systematic approaches may lead to novel diagnostic techniques depending on the 'molecular signature' of the virus. One such example is the identifi cation of eIF2Bγ and eIF2γ as cofactors of hepatitis C virus translational machinery using a functional genomics approach [46]. The 5' untranslated region of HCV functions as an internal ribosome entry site (IRES) to initiate translation of HCV proteins. Using a randomized retroviral vector ribozyme gene library, two ribozymes that inhibit HCV IRES-mediated translation but did not inhibit cap-dependent protein translation or cell growth were identifi ed. The functional targets of these ribozymes were identifi ed as the gamma subunits of human eukaryotic initiation factors 2B and 2, respectively. In addition to leading to the identifi cation of cellular IRES cofactors, ribozymes obtained from this cellular selection system could be directly used to specifi cally inhibit HCV viral translation, thereby facilitating the development of new antiviral strategies for HCV infection. Microarrays are also now being used increasingly in viral diagnosis. A microarray for the detection of fi ve viruses, 11 prokaryotes and two eukaryotes, that are potential agents for biological warfare, has been developed by Wilson et al [47] . A spot array has been described for specifi c detection of enterovirus 71 [48] . DNA microarrays have also found their application in molecular typing of pathogenic viruses for detection of the presence or absence of specifi c viruses for patient management, epidemiological surveillance and transmission studies, or vaccine use. The Affymetrix HIV-1 GeneChip was the fi rst commercial microarray used in clinical virology. It was initially developed for genotyping the protease gene, and later also for the reverse transcriptase gene [49] . An infl uenza virus microarray based on four haemagglutinin, three neuraminidase and two matrix protein gene targets from fi ve different infl uenza virus strains has been described [50] . Similar microarray has been developed for molecular typing of rotaviruses [51]. Using bioinformatics analysis methods, Simmonds et al [52] , identifi ed genome-scale ordered RNA structure (GORS) in many genera and families of positive-strand animal and plant RNA viruses. The authors observed genusassociated variability in members of the family Flaviviridae (e.g. unlike as seen in the related pestivirus and fl avivirus genera, hepaciviruses showed evidence for extensive internal base-pairing), the Picornaviridae, the Caliciviridae, and many plant virus families. The existence of such evolutionarily conserved GORS correlated strongly with the ability of each genus to persist in their natural hosts indicating towards the possible role for GORS in the modulation of innate and acquired host immune response. Viruses are classifi ed broadly into seven groups on the basis of the composition of their nucleic acids and strategy for replication. Despite major differences among these classes, recent results revealed that there are multiple, detailed structural and functional parallels among the replication complexes of four of these seven virus classes: positive-strand RNA viruses, retroviruses and dsRNA viruses [53] . These viruses share several underlying features in genome replication and might have emerged from common ancestors. This has implications for virus function, evolution and control. There is a huge demand for new methods for viral discovery (to identify and characterize novel or unrecognized viral pathogens in human and animal diseases) because of the constant threat posed by emerging and reemerging viral infectious diseases. Besides this, the etiological agents of many diseases are largely unknown; for example, approximately 70% of cases of viral encephalitis and 30% of respiratory tract infections are of unknown etiologies. David Wang's group has previously described a prototype DNA microarray designed for highly parallel viral detection with the potential to detect novel members of known viral families [54] . This microarray contained approximately 1600 oligonucleotides representing 140 viruses. A more comprehensive second-generation DNA microarray-based platform for novel virus identifi cation and characterization was further developed by Wang's group [55] . This microarray contained the most highly conserved 70mer sequences from every fully sequenced reference viral genome in GenBank (as of August 15, 2002) , to maximize the probability of detecting unknown and unsequenced members of existing families by cross-hybridization to these array elements. On average, ten 70mers were selected for each virus, totaling approximately 10,000 oligonucleotides from approximately 1,000 viruses. This pan-viral microarray was used as part of the global effort to identify a novel virus associated with severe acute respiratory syndrome (SARS) in March 2003 [56, 57] . During an outbreak of severe acute respiratory syndrome (SARS) in March 2003, hybridization to this microarray revealed the presence of a previously uncharacterized coronavirus in a viral isolate cultivated from a SARS patient. To further characterize this new virus, approximately 1 kb of the unknown virus genome was cloned by physically recovering viral sequences hybridized to individual array elements. Sequencing of these fragments confi rmed that the virus was indeed a new member of the coronavirus family. This combination of array hybridization followed by direct viral sequence recovery should prove to be a general strategy for the rapid identifi cation and characterization of novel viruses and emerging infectious disease. Functional genomics approach has been used extensively in cancer biology to identify tumor specifi c pathways [58] [59] [60] [61] . Microarray based methods have been used to distinguish HBV and HCV chronic hepatitis based on differentially expressed genes [62, 63] . Broad analysis of the types of genes involved showed that HBV genes involved in apoptosis, cell cycle arrest and extracellular matrix degradation were up-regulated, whereas for HCV genes involved in cell cycle acceleration and extracellular matrix storage were up-regulated. In a related study of hepatocellular carcinoma (HCC), the major difference between HBV and HCV derived tumors was the up-regulation in HCV and down-regulation in HBV of genes involved in activating chemotherapeutic drugs or detoxifying xenobiotic carcinogens [62] . Thus functional genomics approach has been successfully used to study HBV and HCV virus biology and pathogenesis. Functional genomics involves the use of the genome sequence information of an organism coupled with experimentally derived transcriptomics and proteomics data to study the organism as a whole. As reviewed here, such studies further advance our knowledge of basic and applied virology and provide a greater understanding of viral pathogenesis. Functional genomics studies have aided in the development of new viral diagnostics and therapeutics. As more and more viral genomes are sequenced and new more advanced computational tools are made available to analyze, integrate, model and interpret the burgeoning amount of data, new and valuable information about viral systems will emerge. Functional genomics data needs to be made publicly available to hasten the pace of research in this fi eld. Such studies have provided further insight into host and pathogen interactions and will continue to do so in future. Functional genomics studies are increasingly expanding into the boundaries of various traditional biological disciplines. This fi eld is still in its infancy, but growing at a steady rate. Functional genomics: it's all how you read it Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene Nucleotide sequence of bacteriophage phi X174 DNA Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes VIDA: a virus database system for the organization of animal virus genome open reading frames Virus bioinformatics: databases and recent applications National center for biotechnology information viral genomes project Molecular phylogeny and evolutionary timescale for the family of mammalian herpesviruses Kaposi's sarcoma-associated herpesvirus: a new DNA tumor virus A genomic perspective on protein families Microarrays under the microscope Microarray techniques in pathology: tool or toy? Genomics, gene expression and DNA arrays DNA chips: state-of-the art Serial analysis of gene expression Proteome analysis by twodimensional gel electrophoresis and mass spectrometry: strengths and limitations Interactome: gateway into systems biology Biomedical discovery with DNA arrays DNA microarray technology: devices, systems, and applications Principles of gene microarray data analysis Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping High-throughput fl uorescence microscopy for systems biology High-throughput approaches to dissecting MAPK signaling pathways human group A rotaviruses by oligonucleotide microarray hybridization Detection of genome-scale ordered RNA structure (GORS) in genomes of positive-stranded RNA viruses: Implications for virus evolution and host persistence Parallels among positive-strand RNA viruses, reverse-transcribing viruses and double-stranded RNA viruses Microarray-based detection and genotyping of viral pathogens Viral discovery and sequence recovery using DNA microarrays A novel coronavirus associated with severe acute respiratory syndrome Characterization of a novel coronavirus associated with severe acute respiratory syndrome In vivo gene expression profi le analysis of human breast cancer progression Analysis of gene expression profi les in normal and neoplastic ovarian tissue samples identifi es candidate molecular markers of epithelial ovarian cancer Alterations of gene expression during colorectal carcinogenesis revealed by cDNA microarrays after laser-capture microdissection of tumor tissues and normal epithelia Metastasis-associated differences in gene expression in a murine model of osteosarcoma Genome-wide analysis of gene expression in human hepatocellular carcinomas using cDNA microarray: identifi cation of genes involved in viral carcinogenesis and tumor progression Differential gene expression between chronic hepatitis B and C hepatic lesion