key: cord-0705559-ee23ibkq authors: Foxman, Betsy title: Introduction and Historical Perspective date: 2011-01-27 journal: Molecular Tools and Infectious Disease Epidemiology DOI: 10.1016/b978-0-12-374133-2.00001-0 sha: 7a165e6d864a6ad72640d1b6a072720c248bca9b doc_id: 705559 cord_uid: ee23ibkq The term molecular epidemiology emerged apparently independently during the 1970s to early 1980s in the literature of three separate substantive areas of epidemiology: cancer epidemiology, environmental epidemiology, and infectious disease epidemiology. Although these separate substantive areas agree that epidemiology refers to the distribution of disease in a population and the determinants of that distribution, the literature presents conflicting definitions of what makes a study a molecular epidemiologic study. In cancer and environmental epidemiology, molecular is defined almost exclusively in terms of biomarkers. However, biomarkers are only one type of molecular measure, and this definition ignores the many applications of molecular methods in genetic and infectious disease epidemiology. Early epidemiologists made tremendous strides with what are now relatively simple molecular tools, such as using microscopy for identification, showing agents not visible by microscope cause disease, and detecting protective antibodies with hemagglutination assays. In the microbiology literature, molecular epidemiology has become synonymous with the use of molecular fingerprints—regardless of whether the study was population based or met other criteria consistent with an epidemiologic study. Moreover, the molecular tools available, and the potential for applications for studies of populations, have changed substantially since the term molecular epidemiology was coined. Molecular epidemiology is the discipline that combines molecular biology with epidemiology; this means not merely using molecular techniques in epidemiology or population approaches in molecular biology, but a marriage of the two disciplines so that molecular techniques are taken into account during study design, conduct, and analysis. There is no consensus definition for the term molecular epidemiology in the literature (see Foxman and Riley 1 for a review). The term molecular epidemiology emerged apparently independently during the 1970s to early 1980s in the literature of three separate substantive areas of epidemiology: cancer epidemiology, environmental epidemiology, and infectious disease epidemiology. Although these separate substantive areas agree that epidemiology refers to the distribution of disease in a population and the determinants of that distribution, the different literatures present conflicting definitions of what makes a study a "molecular" epidemiologic study. In cancer and environmental epidemiology, molecular is defined almost exclusively in terms of biomarkers. However, biomarkers are only one type of molecular measure, and this definition ignores the many applications of molecular methods in genetic and infectious disease epidemiology. In the microbiology literature, molecular epidemiology has become synonymous with the use of molecular fingerprintsregardless of whether the study was population based or met other criteria consistent with an epidemiologic study. Moreover, the molecular tools available, and the potential for applications for studies of populations, have changed substantially since the term molecular epidemiology was coined. Since the 1980s there has been an explosion of molecular techniques and of technologies that enable their application to large numbers of individuals -a requirement for epidemiology. In the 1980s, the identification of a single bacterial gene would warrant a dissertation. Now we can obtain the entire genetic sequence of a bacterium, such as Escherichia coli whose genome is ~5.5 million base pairs, in a few days (although making sense of the sequence takes a good deal longer). There are databases of genetic sequence of humans, mice, and other animals, and of major human pathogens whose content is growing daily. These databases enable sequence comparisons within and among species, giving insight into possible functions of new genes and the relationships among species. We have vastly improved techniques for determining when genes are turned on and off and under what circumstances, making it significantly easier to characterize proteins. Further, we can measure how the environment changes genome function; these changes, termed the epigenome, can be inherited. These advances all deal with material on the molecular level, and "molecular" has become a synonym for modern molecular techniques that characterize nucleic and amino acids, sometimes including metabolites (the "omics": genomics, 1 transcriptomics, proteomics, metabolomics). Though apparently very broad, this definition of "molecular" excludes many laboratory techniques applied to biological material that might be usefully included in the study of the distribution and determinants of population health and disease (the definition of epidemiology). Thus, for the purposes of this text, molecular is defined as any laboratory technique applied to biological material. However molecular is defined, for a study to be molecular epidemiology, laboratory techniques must be integrated with epidemiologic methods; this integration has profound implications for the design, conduct, and analysis. Molecular biological techniques enhance measures of diagnosis, prognosis, and exposure, reducing misclassification and increasing power of epidemiologic studies to understand the etiology. However, molecular measures may impose strict requirements on data collection and processing, so that the choice of measure dictates the epidemiologic study design. If the measure of the construct of interest is time dependent or storage sensitive the design is constrained to collection at the relevant time point and the conduct must enable rapid testing. Molecular techniques are generally highly sensitive, specific, and discriminatory; this can increase the power of a study reducing the sample size required. The resulting measures may also require different types of analysis. The investigator must understand what the laboratory result is really measuring: it may detect acute exposure or exposure any time in the past. The analysis and interpretation must be adjusted accordingly, as associations with acute exposure predict acquisition, but any time in the past also reflects survival. Thus, a molecular epidemiologic study differs from an epidemiologic study that uses molecular techniques: a molecular epidemiologic study represents a true merger of the disciplines. This implies that the application of epidemiologic methods to the laboratory is also molecular epidemiology. When epidemiologic methods are applied in a laboratory setting, the focus on representative samples and population distributions illuminates the heterogeneity of microbial populations, and of human immune response to those populations, leading to more nuanced interpretations of results from model organisms. Molecular techniques make it possible to distinguish between infectious agents of the same species with discriminatory power that is far beyond that possible using phenotypic comparisons. This ability has enabled more definitive identification of sources of disseminated food-borne outbreaks (E. coli 0157:H7 spread by spinach), demonstration of criminal intent (intentional infection with HIV), and lead us to rethink our understanding of the epidemiology of infectious agents (transmission of Mycobacterium tuberculosis). By characterizing the genetics of infectious agents, we have gained insight into their heterogeneity, and rapidity of evolution, highlighting why some previous vaccine development efforts have been unsuccessful. The ability to measure host response to infectious agents has also revealed that our theories about the extent and duration of immunity is somewhat different from that previously thought; lifetime immunity for some infections may only result from boosting from subclinical infection; as we bring infectious agents under control, vaccination schedules must change commensurately. Finally, we are increasingly able to identify the role of infectious agents in the initiation and promotion of previously classified chronic diseases. When successful, molecular epidemiologic studies help to identify novel methods of disease prevention and control, markers of disease diagnosis and prognosis, and fertile research areas for identifying potential new therapeutics, vaccines, or both. While the integration of the molecular with epidemiological can be very simple, for example, using a laboratory diagnostic measure rather than self-report in an epidemiologic study or describing the distribution of a genetic variant in a collection of bacterial strains, the ultimate success of molecular epidemiologic studies depends upon how well the concerns of each field are integrated. Thus, incorporating a molecular tool that measures the desired outcome or exposure is not sufficient; the strengths and limitations of the chosen measure must be considered in the design, conduct, analysis, and interpretation of the study results. Within an infectious disease context, molecular epidemiology often refers to strain typing or molecular fingerprinting of an infectious agent; within microbiology, molecular epidemiology generally refers to phylogenetic studies. The field of seroepidemiology, screening blood for past exposure to infection, also falls under the umbrella of molecular epidemiology. However, the realm of molecular epidemiology is much larger, and the potential is much broader than strain typing or phylogeny or testing sera for antibodies, because infectious disease includes two each of genomes, epigenomes, transcriptomes, proteomes, and metabolomes, reflecting the interaction of the infectious agent with the human host. Molecular tools now make it possible to explore this interaction. Molecular tools are increasingly integrated into epidemiologic studies of environmental exposures, cancer, heart disease, and other chronic diseases. Thus, although the examples in this book all relate to infectious diseases, many of the underlying principles hold for molecular epidemiologic studies of noninfectious diseases. In the remainder of this chapter, I give an historical perspective on the use of molecular tools in epidemiology, then some examples of the range of molecular epidemiologic studies focusing on infectious disease. I close with a discussion of what distinguishes new molecular tools from those used previously, and what distinguishes modern molecular epidemiology from previous studies using laboratory methods. Both in historic and contemporary studies, the inclusion of laboratory evidence strengthens the inferences made using epidemiologic techniques. One epidemiologic hero is John Snow, who identified a strong epidemiologic association between sewage-contaminated water and cholera. Despite extremely well-documented evidence supporting his arguments, his findings remained in doubt for some time. Max Von Pettenkofer, 1818-1901, a contemporary of Snow and also an early epidemiologist, is related, in a perhaps apocryphal story, to have remained cholera free despite drinking a glass containing the watery stool of someone with cholera. From a modern perspective, this demonstrates the importance of infectious dose and host immunity on disease pathogenesis. Snow's conclusions of a causal link were not generally accepted until 25 years after his death, when the cholera vibrio was discovered by Joseph Koch. This discovery enabled Koch to definitively demonstrate the causal relationship between the vibrio and cholera. 2 The strategy of isolating an organism from an ill individual, showing it can cause disease in a disease naïve individual, and then reisolating it as described in the landmark postulates of Henle and Koch, reflects how incorporating laboratory methods enhances our ability to make causal inferences about disease transmission and pathogenesis from even the most carefully researched epidemiologic evidence. Although our understanding of microbiota as a complex ecologic system increasingly undermines the value of the Henle-Koch postulates, the postulates were critical for establishing the causal role of microbes in human health. Early epidemiologists made tremendous strides with what are now relatively simple molecular tools, such as using microscopy for identification, showing agents not visible by microscope cause disease ("filterable viruses"), and detecting protective antibodies with hemagglutination assays. For example, Charles Louis Alphonse Laveran identified the protozoan that causes malaria using microscopy. 3 Charles Nicolle demonstrated that lice transmitted typhus by injecting a monkey with small amounts of infected louse. Nicolle also observed that some animals carry infection asymptomatically. 4 Wade Hampton Frost used the presence of protective antibodies in the serum of polio patients to monitor the emergence of polio epidemics. 5 Modern molecular biological techniques, such as those used in genomics, make it possible to distinguish between infectious agents of the same species with much finer discrimination than is possible using phenotypic comparisons. Increases in discriminatory power enable more definitive identification of reservoirs of infection and linkage of transmission events, such as identification of the source of a widely disseminated food-borne outbreak. Characterizing the genetics of human pathogens has revealed the tremendous heterogeneity of various infectious agents and the rapidity with which they evolve. This heterogeneity and rapid evolution helps explain our difficulties in creating successful vaccines for the more heterogeneous organisms, such as Neisseria gonorrheae. Molecular analysis has also revealed the role of infectious agents in the initiation and promotion of previously classified chronic diseases, such as cervical cancer. Further, molecular tools have enhanced our understanding of the epidemiology of infectious diseases by describing the transmission systems, identifying novel transmission modes and reservoirs, identifying characteristics of the infectious agent that lead to transmission and pathogenesis, revealing potential targets for vaccines and therapeutics, and recognizing new infectious agents. Molecular tools also make it possible to characterize microbial communities (also called microbiota) found in the environment and in and on the human host, and to describe how they influence health and disease. Instead of reducing the disease process to the pathogen that ultimately causes disease, by characterizing the microbiota we can examine how the presence of other microbes may moderate the ability of a pathogen to be transmitted, express virulence factors, or interact with the human host response and thus cause disease. Considering microbes as communities requires drawing on ecological theory, but it helps epidemiologists understand why sewage contamination of drinking water usually leads to an outbreak by a single (rather than multiple) pathogen, how infection by a virus might enhance bacterial infection, or why no single pathogen has been linked to some diseases, like inflammatory bowel disease, even though the condition is characterized by an apparent disruption in normal microbiota. The combination of molecular tools with epidemiologic methods thus opens new opportunities for understanding disease transmission and etiology, and provides essential information to guide clinical treatment and to design and implement programs to prevent and control infectious diseases. The integration of molecular techniques with epidemiologic methods has already solved many mysteries. For example, the epidemiology of cervical cancer is very analogous to that of a sexually transmitted infection, but it only was with the development of modern molecular techniques and their application in epidemiologic studies that a sexually transmitted virus, human papillomavirus, was identified as the cause of cervical cancer. 6, 7 During the early part of the epidemic of HIV, it was observed that many individuals with AIDS had Kaposi sarcoma, a rare type of cancer. The epidemiology and natural history of the sarcoma was different from cases of Kaposi sarcoma found in individuals without HIV. The epidemiology suggested that Kaposi was caused by an infectious agent, but no agent could be grown using standard culturing techniques. However, using a case-control design and molecular tools -which did not require growing the infectious agent -a virus was identified, now known as herpes simplex virus 8 or Kaposi sarcoma virus. 8 Similarly, molecular typing confirmed epidemiologic observations that tuberculosis transmission can occur in relatively casual settings, 9 and differentiated diarrheas caused by E. coli into pathotypes, which were distinguishable epidemiologically. 10 The powerful combination of molecular tools with epidemiology led to the rapid discovery of the cause of new diseases, such as severe acute respiratory syndrome (SARS). 11 Combining evolutionary theory, molecular techniques and standard epidemiologic methods confirmed that a dentist most likely deliberately infected some of his patients with HIV. 12 Many of these examples will be examined in more detail throughout the text; here, the power of merging molecular tools with epidemiology is illustrated using posttransfusion hepatitis. Much of what we know about hepatitis comes from studies of posttransfusion hepatitis. Because transfusion is a well-defined event, an association could be made with exposure Introduction and Historical Perspective to blood and blood products and differences in incubation period observed. Also, the epidemiologic strategies could be used to help identify or confirm the etiology. Hepatitis means inflammation of the liver; the inflammation can be caused by alcohol and drug use in addition to several infectious agents. Many hepatitis symptoms are nonspecificmalaise, muscle and joint aches, fever, nausea or vomiting, diarrhea, and headache. The more specific symptoms pointing to liver inflammation, such as loss of appetite, dark urine, yellowing of eyes and skin, and abdominal discomfort, occur regardless of cause, and thus cannot be used to distinguish among them. The incubation period is variable, as short as 14 days for hepatitis A and as long as 180 days for hepatitis B. The multiple etiologies, range of incubation periods, and often nonspecific symptoms made the etiology and associated epidemiology for each etiology difficult to discern. The initial observations suggesting an infectious etiology occurred in the 1880s when hepatitis was noted to follow blood transfusions and injections (remember that needles were routinely reused until the 1980s). But transmission by blood serum was not definitely demonstrated until outbreaks of hepatitis were noted following vaccination for yellow fever in the 1940s (some of the vaccines used human serum). 13 Before 1970, human blood and blood products used therapeutically had two sources: paid and volunteer donors. Reasoning that an infectious cause of hepatitis might occur more frequently among paid rather than volunteer blood donors, a retrospective epidemiologic study was conducted in 1964 comparing the risk of posttransfusion hepatitis among those receiving blood from the different donor groups. The difference in incidence was substantial: 2.8/1000 units for paid donors versus 0.6/1000 units for volunteer donors. This observation was followed by a randomized trial in which surgical patients were randomly allocated blood from paid versus volunteer donors. Half (51%) of those receiving commercial blood but none receiving volunteer donor blood developed hepatitis. This is strong epidemiologic evidence of an infectious etiology, identified by epidemiology. However, it was not until hepatitis could be detected that it was confirmed that hepatitis was infectious, that there were multiple transmission modes, and that viruses of very different types could cause hepatitis. Nonetheless, even before the detection of hepatitis viruses, the evidence from these epidemiologic studies was used to change medical practice: by eliminating commercial donors the incidence of posttransfusion hepatitis was decreased by 70% (Figure 1.1) . 14 In 1965 the Australian antigen, a marker of hepatitis B, was discovered among individuals with hemophilia who had multiple blood transfusions. Because the antigen caused a reaction in serum from an Australian Aborigine, it was named Australian antigen. 16 After the antigen was associated with what was known as "long incubation period" hepatitis by Alfred M. Prince 17,18 the U.S. Food and Drug Administration mandated screening of blood for hepatitis B in 1972. This further reduced the incidence of posttransfusion hepatitis by 25% (Figure 1.1) . It also provided impetus for further studies to identify infectious causes of hepatitis. With the identification of the cause of hepatitis A, a picovirus, it was possible to demonstrate that there were additional causes of posttransfusion hepatitis not attributable to either type A or B. Although it took 10 years and new molecular techniques to identify hepatitis C, 19 the ability to classify types A and B made it possible to learn quite a bit about the epidemiology of non-A, non-B hepatitis, including conducting outbreak investigations. 20 The identification of two surrogate markers of non-A, non-B hepatitis and subsequent inclusion of these markers in screening the blood supply reduced posttransfusion hepatitis rates from a range of 7% to 10% to 2% to 3%, and further reductions followed the implementation of an anti-hepatitis C test. The history of posttransfusion hepatitis illustrates the great strengths of combining molecular biology with epidemiology. Epidemiologic observations led to the hypothesis that hepatitis could have an infectious cause, and that this infectious cause might occur with different frequency among paid and volunteer blood donors. Limiting the blood supply to volunteer donors dramatically reduced the incidence of posttransfusion hepatitis in the absence of any testing. However, until molecular tests were available, the types of hepatitis could not be distinguished, and the number of infectious agents causing hepatitis was not known. Laboratorians enhanced their searches for additional causes by selecting for testing specimens from individuals with hepatitis of unknown etiology. Finally, the availability of molecular tests enabled screening of the blood supply, enhancing the public's health. Modern molecular tools can detect trace amounts of material from small amounts of sample at a speed and cost unimaginable just a decade ago. It is possible to use the polymerase chain reaction to amplify nucleic acid from a drop of blood dried on filter paper and identify the genetic sequence of the animal from which the blood was taken. Newer technologies, such as pyrosequencing, enable the rapid sequencing of genetic material, so the entire sequence of a bacterial strain can be determined in less than a week. Modern techniques, such as immunoassays, can be exquisitely sensitive, enabling the detection of material present at parts per trillion. High throughput techniques, such as microarrays, enable rapid testing of large numbers of samples (1000 plus) or the testing of one sample for large numbers of markers. Consider: the time from the first isolation of the coronavirus that causes SARS in 2003 to the determination of the entire genetic sequence of ~29,700 pairs was completed less than 8 weeks following the isolation of the viral RNA. 21 Molecular tools continue to be developed at a rapid pace, and are increasingly available as commercial kits. This makes it possible to not only characterize populations in a new way, but also revisit and increase our understanding of well-described phenomena, providing a tremendous opportunity for the epidemiologist. However, these kits cannot be used blindly; their reliability and validity must be assessed in the population of interest, and the investigator should understand the variability of the test and sensitivity to variation in time of collection, storage, and processing and design the study accordingly. To do so requires both epidemiologic and laboratory expertise. Laboratory methods are often included in epidemiologic studies. In an outbreak investigation, laboratory methods enhance case definition by confirming disease diagnosis. If the same infectious agent can be isolated from the epidemiologically identified source, the investigation is confirmed. Seroepidemiology studies use laboratory measures to describe the prevalence of previous exposure to an infectious agent. The purpose of a clinical epidemiologic study might be to identify early markers of disease or predictors of prognosis that can be identified in the laboratory. In environmental epidemiologic studies, laboratory assays may be used to measure exposure. In the ideal case, all these studies would be molecular epidemiology studies, not just because they use molecular tools, but because the choice of molecular tool and the implications of that choice are accounted for in the design, conduct, and analysis. Unfortunately, this is often not the case, which weakens any inferences that might be made from the study. A typical example is using a standard laboratory test for diagnosis. Is the study participant sick or not? Consider the disease strep throat. Strep throat comes with a sore throat, white or yellow spots on the throat or tonsils, and high fever. This sounds definitive, but the same clinical presentation can include other diseases, including infectious mononucleosis. Everyone who has strep throat, by definition, has group A Streptococcus growing in their throat. However, not everyone who has group A Streptococcus growing in their throat has a strep throat, even if they have a sore throat (the symptoms might be due to a viral infection). Further, there are several ways to detect the presence of group A Streptococcus, which vary in sensitivity, specificity, and type of information provided. Some give a yes/no answer, and others give information on the amount of bacteria present. Others test for specific characteristics of the bacteria, such as presence of a toxin Streptococcus can produce. If the individual has been treated recently with an antibiotic, it is likely that no group A Streptococcus will be detected by culture, but later, group A Streptococcus might be cultured from their throat! So, when and if the individual has symptoms, symptom severity and whether the individual has been treated matter. Using a laboratory test alone is not definitive for diagnosis, but it is necessary. However, if the study is to determine how many individuals are carrying group A Streptococcus, it may be sufficient, although the investigator must decide at what level of detection an individual is "carrying" the organism, and if those with symptoms will be considered as a different category. Further, if it matters what type of group A Streptococcus is found, another laboratory method may be optimal. This is a relatively simple example, but it illustrates how the research question constrains the molecular tool, and how the molecular tool can constrain the timing of specimen collection. How to choose an appropriate molecular tool, how it can constrain the study design, and what it implies about the conduct and analysis of the results are the focus of this text. This textbook is organized into three sections. Section 1, covered in Chapters 1 to 3, presents examples of how molecular tools enhance epidemiologic studies, with a special emphasis on infectious diseases. Chapter 4 is a primer of molecular techniques, and Chapter 5, a primer of epidemiologic methods. Section 2, covered in Chapters 6 and 7, presents technical material about standard molecular techniques. This section is not meant to teach the reader how to conduct specific assays, but to familiarize them with the vocabulary of current techniques, their applications, and any caveats about their use. The last section, covered in Chapters 8 to 11, discusses the implications of adding molecular techniques to epidemiologic studies. This includes discussions of study design, conduct, and analysis. Chapter 12 focuses on the ethical issues that arise from using biological samples. In the final chapter, I discuss possible hot areas for future research and development. Molecular epidemiology: Focus on infection 100th anniversary of Robert Koch's Nobel Prize for the discovery of the tubercle bacillus Nobel Prizes for discovering the cause of malaria and the means of bringing the disease under control: hopes and disappointments The Nobel chronicles The causal link between human papillomavirus and invasive cervical cancer: a population-based case-control study in Colombia and Spain A cohort study of the risk of cervical intraepithelial neoplasia grade 2 or 3 in relation to papillomavirus infection Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma Transmission of Mycobacterium tuberculosis through casual contact with an infectious case Escherichia coli as a cause of diarrhea Severe acute respiratory syndrome: Identification of the etiological agent Molecular epidemiology of HIV transmission in a dental practice Yellow fever vaccine-associated hepatitis epidemic during World War II: Follow-up more than 40 years later History of posttransfusion hepatitis Chronic hepatitis C: the virus, its discovery and the natural history of the disease Australia antigen and the revolution in hepatology Relation between SH and Australia antigens An antigen detected in the blood during the incubation period of serum hepatitis Detection of antibody against antigen expressed by molecularly cloned hepatitis C virus cDNA: Application to diagnosis and blood screening for posttransfusion hepatitis Non-A, non-B hepatitis: a prospective study of a hemodialysis outbreak with evaluation of a serologic marker in patients and staff SARS CoV