key: cord-0000410-ydzxp0oe authors: Chaussabel, Damien; Pascual, Virginia; Banchereau, Jacques title: Assessing the human immune system through blood transcriptomics date: 2010-07-01 journal: BMC Biol DOI: 10.1186/1741-7007-8-84 sha: bc7b2271acba0248f021e9e11cb91cec6358d924 doc_id: 410 cord_uid: ydzxp0oe Blood is the pipeline of the immune system. Assessing changes in transcript abundance in blood on a genome-wide scale affords a comprehensive view of the status of the immune system in health and disease. This review summarizes the work that has used this approach to identify therapeutic targets and biomarker signatures in the field of autoimmunity and infectious disease. Recent technological and methodological advances that will carry the blood transcriptome research field forward are also discussed. outgoing lymphatic vessels, the cells again reach the bloodstream to be transported to tissues throughout the body. Upon patrolling these tissues, they gradually drift back into the lymphatic system to re-enter the blood and begin the cycle all over again. Th e complex patterns of recirculation depend on the state of cell activation, the adhesion molecules expressed by immune and endothelial cells, and the presence of chemotactic molecules that selectively attract particular populations of blood cells. Circulating immune cells are, in addition, exposed to factors that are released systemically. A wide range of molecular and cellular profi ling assays is currently available for the study of the human immune system ( Figure 2) . Th e level of sophistication of instruments such as polychromatic fl ow cytometers, one of the immunologist's favorite tools, has increased over the past few years. Major technological breakthroughs have also occurred in the fi elds of genomics and proteomics, thus creating today a unique opportunity for the study of human beings in health and disease where inherent heterogeneity dictates that large collections of samples be analyzed. Among the high-throughput molecular profi ling technologies available today, genomic approaches are the most scalable, have the most breadth and robustness, and therefore are best suited for the study of human populations. Th e human genome can be investigated from two diff erent angles that consist of either determining its make up or measuring its output. Sequence variation can be detected using, for instance, single nucleotide polymorphism (SNP) chips, which permit the identifi cation of common polymorphisms or rare mutations associated with diseases. Hundreds of thousands of SNPs can be typed using these platforms, yielding a genome-wide, hypothesis-free scan of genetic associations for a given phenotype of interest. Many such genome-wide association studies (often referred to as GWAS) have been published in recent years, a number of them investigating the genetic underpinning of immune-related diseases [1] . Notably, such studies have been useful to pinpoint genes and pathways that may be involved in the pathogenesis of Figure 1 . Blood is the pipeline of the immune system. Transcriptional profi ling in the blood consists of measuring RNA abundance in circulating nucleated cells. Changes in transcript abundance can result from exposure to host or pathogen-derived immunogenic factors (for example, pathogen-derived molecular patterns activating specialized pattern recognition receptors expressed at the surface of leukocytes) and/or changes in relative cellular composition (for example, infl ux of immature neutrophils occurring in response to bacterial infection). The main blood leukocyte populations circulating in the blood are represented in this fi gure. Each cell type has a specialized function. Eosinophils, basophils and neutrophils are innate immune eff ectors playing a key role in defense against pathogens. T lymphocytes are the mediators of the adaptive cellular immune response. Antibody producing B lymphocytes (plasma cells) are key eff ectors of the humoral immune response. Monocytes, dendritic cells and B lymphocytes present antigens to T lymphocytes and play a central role in the development of the adaptive immune response. Blood leukocytes can be exposed in the circulation to factors released systemically from tissues where pathogenic processes take place. In addition, leukocytes will cross the endothelial barrier to reach local sites of infl ammation. Dendritic cells exposed to infl ammatory factors in tissues will be transported via the lymphatic system and reach lymph nodes via the aff erent lymphatic vessels. These dendritic cells will encounter naïve T cells that are transported to the lymph node via high endothelial venules. 'Educated' T cells will then exit the lymph node via eff erent lymph vessels that collect in the thoracic lymph duct, which in turn connects to the subclavian vein, at which point these T cells rejoin the blood circulation. [2] . Associations between common genetic variants and resistance to infection have also been reported [3, 4] . However, parameters measured by this approach are determined by heredity and will not change throughout the life of an individual. Th is is in contrast to transcript abundance, which is the parameter measured by the second genome-wide profi ling approach. Transcriptional activity is largely dependent on environmental factors and, as a result, RNA abundance will change dynamically over time. For instance, sets of transcripts may be induced in response to an infectious challenge and return to baseline levels following pathogen clearance. Dynamic changes in the cellular make up of a tissue will also eff ect changes in transcript abundance that will be measured on a genome-wide scale. Transcriptional profi les have been obtained from many human tissues -including, for instance, the skin [5, 6] , muscle [7] , liver [8, 9] , kidney [10, 11] or brain [12] -but the status of the immune system can be best monitored by profi ling transcript abundance in blood. Indeed, profi ling transcript abundance in blood provides a 'snap shot' of the complex immune networks that operate throughout the entire body. However, while this has proven to be a valid approach to fi nding clues about The number of high-throughput molecular and cellular profi ling tools that can be used to profi le the human immune system is increasing rapidly. Proteomic assays are used to determine antibody specifi city or measure changes in serum levels of cytokines or chemokines using multiplex assays. Cellular profi ling assays are used to phenotype immune cells based on intracellular or extracellular markers using polychromatic fl ow cytometry. In vitro cellular assays can measure innate or antigen-specifi c responsiveness in cells exposed to immunogenic factors. Genomic approaches consist of measuring abundance of cellular RNA and also microRNAs that are present in cells or in the serum. Other genomic approaches consist of determining gene sequence and function (for example, genome-wide association studies, RNA interference screens, exome sequencing). patho genesis as well as to identifying potential biomarkers [13] [14] [15] [16] , a number of challenges and limitations exist. Data interpretation is one of them. Firstly, the volume of data generated from such studies can be overwhelming, and it is necessary to integrate information from a multitude of sources (study design, quality control data, sample information, and importantly clinical information) in order for the results to be interpretable. Secondly, the changes in transcript abundance observed in complex tissues such as blood can be caused not only by regulation of gene transcriptional activity but also by relative changes in abundance of cell populations expressing transcripts at constant levels. Th irdly, in addition to pathogenic processes, a number of factors may aff ect blood transcript abundance and confound the analysis. Medications and co-morbidities are two such factors that often restrict patient selection and complicate data interpretation. Th is review will discuss some of the strategies recently developed that will address some of these limitations. Real-time PCR technology is currently considered the gold standard for the analysis of gene expression. However, it can be used to measure abundance of only a limited number of transcripts. Introduced over 10 years ago, DNA microarrays are now in routine use and can measure transcript abundance on a genome-wide scale. Th is technology relies on dense arrays of oligonucleotide probes that will capture complementary sequences present in biological samples at various concentrations. Th e probes can be deposited on a solid surface (printed microarrays), synthesized in situ (Aff ymetrix GeneChips), or bound to glass beads lodged into wells etched in the surface of a glass slide (Illumina BeadArrays). Th e labeled material captured by the microarray is imaged and relative abundance determined based on the strength of the signal produced by each oligonucleotide feature. It should be noted that, while they provide a means to survey transcript abundance on a genome-wide scale, the sensitivity of microarray assays is low compared to other approaches such as real-time PCR. A microarray is not a fully quantitative assay and changes in transcript abundance must be measured in reference to control samples that need to be included in each study. However, some of these limitations may be lifted by methods relying on high-throughput sequencing for the genome-wide measurement of RNA abundance [17] . Building on the legacy of the SAGE (serial analysis of gene expression) technology introduced in the 1990s, RNA sequencing (RNA-seq) [18] uses either total or fractionated RNA, for example poly(A)+, as a starting point. Th is material is converted to a library of cDNA fragments. High throughput sequencing of such fragments yields short sequences or reads that are typically 30 to 400 bp in length, depending on the technology platform used. For a given sample, tens of millions of such sequences will then be uniquely mapped against a reference genome. Th e higher the level of expression of a given gene, the higher the number of reads that will be aligned against it ( Figure 3 ). Th us, this approach does not rely on probe design and provides several types of information, including not only transcript abundance but also transcriptome structure (splice variants), profi les of non-coding RNA species, and genetic polymorphisms. RNA-seq is expected to become sufficiently cost-eff ective and practical that it will eventually supersede microarray technologies. Other technologies should be considered for the profi ling of focused sets of genes. Nanostring technology can, for instance, detect the abundance of up to 500 transcripts with high sensitivity [19] . Th e approach is 'digital' since it counts individual RNA molecules using strings of fl uorochromes as reporters to identify the diff erent RNA species. Other technology platforms developed by, among others, Luminex, High Th roughput Genomics or Fluidigm round up the off ering for 'subgenome' transcript profi ling. Th e fi eld of autoimmunity has proven a fertile ground for blood transcriptional studies. Alterations in transcript abundance in the blood of patients refl ect the sustained response against self-antigens and, more generally, uncon trolled infl ammatory processes. Such diseases often present with recurring-remitting patterns of activity, with episodes of fl aring that may be refl ected by fl uctuations in transcript abundance. Th e work has initially focused on diseases with clear systemic involvement such as systemic lupus erythematosus (SLE) [20, 21] . Multiple cell types and soluble mediators, including IL10 [22, 23] and IFNγ [24] [25] [26] , have been proposed to be at the center of lupus pathogenesis. While some scattered evidence indicated the potential role of type I interferon in lupus, several observations did not support the hypothesis: fi rst, not every SLE patient has detectable serum type I IFN levels [27] ; second, dysregulation of type-I IFN production is not found in most murine SLE-models [28] ; and third, genetic linkage and association studies had not identifi ed candidate lupus susceptibility genes within the IFN pathway [29] . However, in one of our earliest microarray studies we demonstrated that all but one of the pediatric patients exhibited upregulation of IFNinducible genes, and the only patient lacking this signature had been in remission for over 2 years [20] . In addition, it was found that treating SLE patients with high dose IV steroids, which are used to control disease fl ares, results in the silencing of the IFN signature. A surprise from these initial studies was the absence of type I IFN gene transcripts in the face of an abundance of IFN-inducible ones in the blood cells of SLE patients. A likely explanation is that the cells producing type I IFN, and therefore transcribing these genes, migrate to sites of injury. Altogether, results from microarray studies played a key role in convincing the community of the potential importance of type I IFN in SLE pathogenesis [15, [30] [31] [32] [33] [34] . A phase Ia trial to evaluate the safety, pharmacokinetics, and immunogenicity of anti-IFNα monoclonal antibody (mAb) therapy in adult SLE patients was recently conducted [35] . Th e antibody elicited a specifi c and dosedependent inhibition of overexpression of type I IFNinducible genes in both whole blood and skin lesions from SLE patients, at both the transcript and protein levels. As expected, overexpression of BLyS/BAFF, a type I IFN-inducible gene, also decreased with treatment. Th us, this fi rst trial supports the proposed central role of type I IFN in human SLE. Systemic onset juvenile arthritis (SoJIA) is another disease with systemic involvement that greatly benefi ted from the study of blood transcriptional profi les with the development of both therapeutic and diagnostic modalities [14, 16, 36, 37] . Diseases with specifi c organ involvement have also been the subject of signifi cant, yet not always extensive, blood profi ling eff orts. Blood signatures have, for instance, been obtained from patients with Following extraction, RNA is used as a template and amplifi ed in a labeling reaction. The labeled material captured by the microarray is imaged and relative abundance determined based on the strength of the signal produced by the fl uorochromes that serve as reporters in this assay. The Nanostring technology measures RNA abundance at the single molecule level. RNA serves as starting material for this assay, which does not involve the use of enzymes for amplifi cation or labeling. Capture and reporter probes form complexes in solution with RNA molecules. These complexes are captured on a solid surface and imaged. Molecule counts are generated based on the number of reporter probes detected on the image. The reporter consists of a string of seven fl uorochromes, with four diff erent colors available to fi ll each position. Up to 500 diff erent transcripts can be detected in a single reaction on this platform. For RNA sequencing (RNA-seq) the starting RNA population must fi rst be converted into a library of cDNA fragments. High throughput sequencing of such fragments yields short sequences or reads that are typically 30 to 400 bp in length. For a given sample tens of millions of such sequences will then be uniquely mapped against a reference genome. The density of coverage for a given gene determines its relative level of expression. Similarities and diff erences between these technology platforms should be noted. For instance, microarrays and Nanostring technologies rely on oligonucleotide probes to capture complementary target sequences. Nanostring and RNA-seq technologies measure abundance at the single molecule level, with results expressed as molecule counts and sequence coverage, respectively. Microarray and RNA-seq technologies require extensive sample processing, which include amplifi cation steps. dsDNA, double-stranded DNA. Nanostring RNA-seq multiple sclerosis [38, 39] . Given the inaccessibility of the brain, blood constitutes a particularly attractive source of surrogate molecular markers for this disease. Th ese eff orts have yielded a systemic signature and identifi ed potential predictive markers of clinical relapse and response to treatment [40] [41] [42] . Transcriptional signatures have also been generated in the context of dermatologic diseases. In this case, the target organ being readily accessible, eff orts have been focusing on profi ling transcript abundance in skin tissues [43, 44] . However, systemic involvement has been recognized in recent years to be an important component of autoimmune skin diseases and unique blood transcriptional profi les have also been identifi ed in patients with, for example, psoriasis [45] [46] [47] . Blood transcriptional profi les have been generated in the context of many other autoimmune diseases. Indeed, the range of autoimmune/autoinfl ammatory diseases that have been investigated encompasses SLE [20, 21, 48, 49] , juvenile idiopathic arthritis [16, [50] [51] [52] [53] , multiple sclerosis [54, 55] , rheumatoid arthritis [56] [57] [58] [59] , Sjogren's syndrome [60] , diabetes [61, 62] , infl ammatory bowel disease [63] , psoriasis and psoriatic arthritis [45, 47] , infl ammatory myopathies [64, 65] , scleroderma [66, 67] , vasculitis [68] and anti-phospholipid syndrome [69] . Th e body of work produced that focuses on blood transcript profi ling in the context of autoimmune diseases has been covered at length in a recent review [70] . Global changes in transcript abundance have also been measured in the blood of patients with infectious diseases. In this context, alterations of blood transcriptional profi les are a refl ection of the immunological response mounted by the host against pathogens. Th is response is initiated by specialized receptors expressed at the surface of host cells recognizing pathogen-associated molecular patterns [71] . Diff erent classes of pathogens signal through diff erent combinations of receptors, eliciting in turn diff erent types of immune responses [72] . Th is translates experimentally into distinct transcriptional programs being induced upon exposure of immune cells in vitro to distinct classes of infectious agents [73] [74] [75] . Similarly, patterns of transcript abundance measured in the blood of patients with infections caused by diff erent etiological agents were found to be distinct [13] . Predictably, dramatic changes were observed in the blood of patients with systemic infections (for example, sepsis) [76, 77] . However, profound alterations in patterns of transcript abundance were also found in patients with localized infections (for example, upper respiratory tract infection, urinary tract infections, pulmonary tubercu lo sis, skin abscesses) [13, 16, 78] . Measuring changes in host transcriptional profi les may therefore prove of diagnostic value even in situations where the causative pathogenic agent is not present in the test sample. Importantly, it may also help ascertain the severity of the infection and monitor its course. Infections often present as acute clinical events; thus, it is important to capture dynamic changes in transcript abundance that occur during the course of the infection from the time of initial exposure. Blood signatures have been described in the context of acute infections caused by a wide range of pathogenic parasites, viruses and bacteria, including Plasmodium [79, 80] , respiratory viruses (infl uenza, rhinovirus, respiratory syncytial virus) [13, [81] [82] [83] [84] , dengue virus [85, 86] , and adenovirus [82] , as well as Salmonella [87] , Mycobacterium tuberculosis [78] , Staphylococcus aureus [88] , Burkholderia pseudomallei [76] and the general context of bacterial sepsis [77, [89] [90] [91] . Some of those pathogens will persist and establish chronic infections (for example, human immuno deficiency virus and Plasmodium) that may lead to a state of latency (for example, tuberculosis), and transcript profi ling may be used in those situations as a surveillance tool for monitoring disease progression or reactivation. Blood profi ling of infectious diseases remains limited in scale. In particular, additional studies will be necessary to ascertain dynamic changes occurring over time. In addition to autoimmune and infectious diseases, blood transcript profi ling studies have been carried out in the cancer research fi eld. While hematological malignancies have led the way (reviewed in [92] ), blood profi les have also been obtained more recently from patients with solid organ tumors [93] . Notably, these signatures can refl ect not only the immunological or physiological changes eff ected by cancers but also the presence of rare tumor cells in the circulation [94] [95] [96] . Blood signatures have also been obtained from solid organ transplant recipients in the context of both tolerance [97] [98] [99] and graft rejection [10, 100, 101] . While such signatures can also be detected in biopsy material [102] [103] [104] , blood off ers the distinct advantage of being accessible for safely monitoring molecular changes on a routine basis. Some work has also been done in the context of cardiovascular diseases where infl ammation is known to play an important role. Hence, profi les have been identifi ed in a wide range of conditions, including stroke, chronic heart failure or acute coronary syndrome [105] [106] [107] [108] . Th e body of published work is too large to be cited in this review -and it is likely to be only the tip of the iceberg, with a lot more unpublished data scattered through out public and private repositories. Other eff orts have yielded, for instance, blood transcriptional signatures in patients with neurodegenerative diseases [109] [110] [111] , and those associated with disease exacerbation or responsiveness to glucocorticoids in patients with asthma [112, 113] , and with responses to environmental exposure [114] [115] [116] , exercise [117, 118] or even laughter [119] . Unfortunately, too many published studies are underpowered and sometimes lack even the most rudimentary validation steps. All too often primary data are not available for reanalysis either, refl ecting a lack of enforcement of editorial policies, or the absence thereof in some journals. Hence, one of the main challenges for this fi eld is to move beyond the proof of principle stage and consolidate the wealth of data being generated. Collectively, studies published thus far demonstrate that alterations in transcript abundance can be detected on a genome-wide scale in the blood of patients with a wide range of diseases. Th is statement is far from trivial given the skepticism that initially met studies investigating the blood transcriptome of patients. We have also learned that: 1) multiple diseases can share components of the blood transcriptional profi le -for instance, the case for infl ammation or interferon signatures; 2) while no single element of the profi le may be specifi c to any given disease it is the combination of those elements that makes a signature unique; and fi nally, 3) the work accomplished to date highlights the importance of carrying out analyses aiming at directly comparing transcriptional profi les across diseases. Indeed, much can be learned, for instance, about autoimmunity from studying responses to infection, and vice versa. Furthermore, such eff orts may eventually lead us closer to a molecular classifi cation of diseases. First, however, technological and methodological advances are necessary for the blood transcriptome research fi eld to move beyond the proof of principle stage. Recent progress in blood transcriptome research has been possible thanks to the development of robust sample collection techniques and the introduction of high throughput gene expression microarray platforms. Such advances have been necessary but the margin for progression in the fi eld is still very signifi cant. We describe here some of the current hurdles and discuss potential solutions for overcoming them. For years the scale of blood transcriptional studies has been constrained by the cost of the technology. With the price tag on a commercial whole genome microarray below the $100 US mark, this is not the case anymore. Th us, data management has now become the fi rst essential step to making large scale molecular profi ling a viable proposition. Beyond storing the output of microarray instruments, data management must capture and organize information that is essential for the interpretation of the results (Figure 4) . Th is includes sample information, data quality metrics, clinical information collected at the time of sampling, details about the experimental design, and materials and methods. Capturing such information ensures that the large volumes of data generated, which are often not published immediately, will remain exploitable for years to come. Th is point has become critical given the fact that results from genome-wide profi ling studies can never be exploited to their fullest extent and possess considerable cumulative value when re-analyzed collectively. Notably, the results generated by other cellular and molecular profi ling platforms will also need to be integrated in order to complete the picture. Th erefore, implementing eff ective data management solu tions and practices is essential to sustain the necessary increase in the scale of blood transcriptional studies (Figure 3 ) [120] . Unfortunately, implementing data management solutions in the laboratory is often an expensive proposition, requiring customization of off -the-shelf products or development of custom software adapted to handle specifi c workfl ows. Managing data also takes time and requires dedicated personnel. Th us, while the need is widely perceived, the commitment and steps necessary to implement eff ective data management solutions and practices are rarely adopted. A myriad of approaches have been developed for the analysis of genome-wide transcriptional profi ling data [121] [122] [123] [124] . However, there is no silver bullet when it comes to microarray data analysis. Th e challenges encountered are several fold: 1) dimensionality, or how to cope with the fact that the number of parameters measured exceeds by several orders of magnitude the number of conditions included in most experiments; 2) noise -a direct consequence of the fi rst point is that results from microarray analyses are particularly permissive to noise (false discovery); 3) 'seeing' the data -data visualization is critical as it helps promote insight and supports data interpretation; 4) biological context -it is important to keep the biology in sight at all times. Indeed, while it is easy to become absorbed by the data, it is essential to use biological knowledge when designing analysis strategies. Finally, there is hardly a one-size-fi ts-all approach to micro array data analysis and what works in one situation may not be universally applicable. Indeed, the most common response from experts when questioned on the best way to analyze a given dataset is that 'it depends…': it depends, for instance, on the extent of the diff erences being observed or on the variability inherent to a given disease or study population; it depends on what questions are being asked; or it can depend on whether follow-up confi rmatory experiments are planned. In Table 1 we provide a data mining primer that explains the basic steps involved in microarray data analysis and the considerations that arise [125] [126] [127] [128] [129] . Ad hoc data mining approaches can be developed to meet specifi c needs. For instance, we have developed a data mining strategy for the specifi c purpose of analyzing blood transcriptional profi les [15] . Th is approach simply consists of a priori grouping of sets of genes with similar transcriptional patterns. Th is is repeated for several diff erent datasets and subsequently, when comparing the cluster membership of all the genes across those datasets, the genes with similar membership Figure 4 . Data management is key to progress. Extensive cellular and molecular profi ling of human subjects generates vast amounts of disparate data. Eff ective data management and integration solutions are essential to the preservation of this information in an interpretable form. Thus, data management eff orts occurring 'behind the scenes' have an essential role to play in realizing the full potential of high throughput profi ling approaches in human subjects. are grouped together to form what we have termed a transcriptional module. Structuring the data permits focusing downstream statistical testing on these sets of transcripts that form coherent transcriptional and functional modular units. Th is is in contrast with more traditional approaches that rely on iterative statistical testing for thousands of individual transcripts that are treated as independent variables. Th e modular transcriptional framework that we have developed reduces the number of variables by collapsing sets of coordinately expressed genes into a new entity, the module. Reducing data dimensionality as such can: 1) facilitate functional inter pretation; 2) enable comparative analyses across multiple datasets and diseases; 3) minimize noise and improve robustness of biomarker signatures; and 4) yield multivariate metrics that can be used at the bedside [15] . Data visualization is also of critical importance for the interpretation of large-scale datasets. We have devised a straightforward visualization scheme for mapping global transcriptional changes for individual diseases on a modular basis ( Figure 5 ).Briefl y, diff erences in expression levels between study groups are displayed for each module on a grid. Each position on the grid is assigned to a given module; a red spot indicates an increase and a blue spot a decrease in transcript abundance. Th e spot intensity is determined by the proportion of transcripts reaching signifi cance for a given module. A posteriori, biological interpretation has linked several modules to immune cells or pathways (see legend of Figure 5 ). Hence, in the example provided in Figure 5 Here we provide basic analysis steps and important considerations for microarray data analysis: -Per-chip normalization: This step controls for array-wide variations in intensity across multiple samples that form a given dataset. Arrays, as with all fl uorescence based assays, are subject to signal variation for a variety of reasons, including the effi ciency of the labeling and hybridization reactions and possibly other, less well defi ned variables, such as reagent quality and sample handling. To control for this, samples are normalized by fi rst subtracting background and then employing a normalization algorithm to rescale the diff erence in overall intensity to a fi xed intensity level for all samples across multiple arrays. -Data fi ltering: Typically more than half of the oligonucleotide probes present on a microarray do not detect a signal for any of the samples in a given analysis. Thus, a detection fi lter is applied to exclude these transcripts from the original dataset. This step avoids the introduction of unnecessary noise in downstream analyses. -Unsupervised analysis: The aim of this analysis is to group samples on the basis of their molecular profi les without a priori knowledge of their phenotypic classifi cation. The fi rst step, which functions as a second detection fi lter, consists of selecting transcripts that are expressed in the dataset and display some degree of variability, which will facilitate sample clustering. For instance, this fi lter could select transcripts with expression levels that deviate by at least two-fold from the median intensity calculated across all samples. Importantly, this additional fi lter is applied independently of any knowledge of sample grouping or phenotype, which makes this type of analysis 'unsupervised' . Next, pattern discovery algorithms are often applied to identify 'molecular phenotypes' or trends in the data. -Clustering: Clustering is commonly used for the discovery of expression patterns in large datasets. Hierarchical clustering is an iterative agglomerative clustering method that can be used to produce gene trees and condition trees. Condition tree clustering groups samples based on the similarity of their expression profi les across a specifi ed gene list. Other commonly employed clustering algorithms include k-means clustering and self-organizing maps. -Class comparison: Such analyses identify genes that are diff erentially expressed among study groups ('classes') and/or time points. The methods for analysis are chosen based on the study design. For studies with independent observations and two or more groups, t-tests, ANOVA, Mann-Whitney U tests, or Kruskal-Wallis tests are used. Linear mixed model analyses are chosen for longitudinal studies. -Multiple testing correction: Multiple testing correction (MTC) methods provide a means to mitigate the level of noise in sets of transcripts identifi ed by class comparison (in order to lower permissiveness of false positives). While it reduces noise, MTC promotes a higher false negative rate as a result of dampening the signal. The methods available are characterized by varying degrees of stringency, and therefore they produce gene lists with diff erent levels of robustness. • Bonferroni correction is the most stringent method used to control the familywise error rate (probability of making one or more type I errors) and can drastically reduce false positive rates. Conversely, it increases the probability of having false negatives. • Benjamini and Hochberg false discovery rate [125] is a less stringent MTC method and provides a good balance between discovery of statistically signifi cant genes while limiting false positives. By using this procedure with a value of 0.01, 1% of the statistically signifi cant transcripts might be identifi ed as signifi cant by chance alone (false positives). -Class prediction: Class prediction analyses assess the ability of gene expression data to correctly classify a study subject or sample. K-nearest neighbors is a commonly used technique for this task. Other available class prediction procedures include, but are not limited to, discriminant analysis, general linear model selection, logistic regression, distance scoring, partial least squares, partition trees, and radial basis machine. -Sample size: The number of samples necessary for the identifi cation of a robust signature is variable. Indeed, sample size requirements will depend on the amplitude of the diff erence between, and the variability within, study groups. A number of approaches have been devised for the calculation of sample size for microarray experiments, but to date little consensus exists [126] [127] [128] [129] . Hence, best practices in the fi eld consist of the utilization of independent sets of samples for the purpose of validating candidate signatures. Thus, the robustness of the signature identifi ed will rely on a statistically signifi cant association between the predicted and true phenotypic class in the fi rst and the second test sets. modules. It should also be noted that no changes were observed for other modules, such as module M3.1, which includes interferon-inducible genes, abundance of which would be increased in the context of a viral infection. MicroRNA (miRNA) control has emerged as a critical regu latory circuit of the immune system. Measuring changes in miRNA abundance in the blood of human subjects in health and disease is therefore a promising new fi eld of investigation. Th ese short non-coding singlestranded RNAs about 22 nucleotides in length have been found to play essential regulatory roles [130] [131] [132] . Th ese molecules exhibit highly specifi c, regulated patterns of expression and control protein expression by trans lational repression, mRNA cleavage, or promotion of mRNA decay. Interestingly, thanks to their small size, miRNA molecules are stable and can be measured not only in blood cells but also in circulation in the serum [133] . Th ey are thus not only potentially important contributors to immune function, but also potential sources of biomarkers. Blood transcriptome research will also benefi t from concep tual advances that may help address shortcomings inherent to whole blood profi ling. First, blood is a complex tissue and changes in transcript abundance can be attributed to either transcriptional regulation or relative changes in composition of leukocyte populations. Two approaches exist for 'deconvoluting' these two phenomena. First, one can isolate and individually profi le diff erent cell populations present in the blood. Th is approach may also permit the identifi cation of transcripts expressed at low levels or the detection of diff erences in expression that would otherwise be drowned in whole blood [134, 135] . However, isolation methods may introduce technical bias, and require extensive sample processing. A second approach consists of deconvoluting whole blood transcriptional profi les 'in silico' . Th is type of analysis attempts to deduce cellular composition or cell-specifi c levels of gene expression using statistical methodologies [136] [137] [138] [139] [140] [141] . Finally, we must also keep in mind that the immune status of a human subject is not entirely refl ected by its blood profi le obtained at the steady state. Indeed, an individual's capacity to respond to innate as well as antigen-specifi c immune signals may also provide useful and complementary information. In conclusion, blood transcript profi ling has earned its place in the molecular and cellular profi ling armamentarium used to study the human immune system. Changes in transcript abundance recapitulate the infl uence of genetic, epigenetic, cellular and environ mental factors. Initially considered to belong to the 'cutting edge' , this approach has become both robust and practical. As discussed in this review, it has become a mainstay for the study of immune function in patients with a wide range of diseases. Furthermore, recent studies have demonstrated the utility of blood transcriptome profi ling for monitoring immune responses to drugs or vaccines [35, 142, 143] . Th us, blood transcript profi ling is developing Relative changes in transcript abundance in the blood of patients with S. aureus infection compared to that of healthy controls are recorded for a set of 28 transcriptional modules. Colored spots represent relative increase (red) or decrease (blue) in transcript abundance (P < 0.05, Mann Whitney) within a module. The legend shows functional interpretation for this set of modules. Fingerprints have been generated for two independent cohorts of subjects (divided into a training set used in the discovery phase, n = 30, and an independent test set used in the validation phase, n = 32). into a mainstream tool for the assessment of the status of the human immune system. Recent advances in the genetics of autoimmune disease Detecting shared pathogenesis from the shared genetics of immune-related diseases A whole-genome association study of major determinants for host control of HIV-1 Genome-wide association of IL28B with response to pegylated interferon-alpha and ribavirin therapy for chronic hepatitis C A microarray analysis of temporal gene expression profi les in thermally injured human skin Comparison of normal human skin gene expression using cDNA microarrays IL-1 receptor antagonism and muscle gene expression in patients with type 2 diabetes Use of cDNA microarrays to analyze dioxin-induced changes in human liver gene expression Microarray analysis of liver gene expression in iron overloaded patients with sickle cell anemia and beta-thalassemia Kidney transplant rejection and tissue injury by gene profi ling of biopsies and peripheral blood lymphocytes Molecular correlates of renal function in kidney transplant biopsies Comparative gene expression analysis of blood and brain provides concurrent validation of SELENBP1 up-regulation in schizophrenia Chaussabel D: Gene expression patterns in blood leukocytes discriminate patients with acute infections Role of interleukin-1 (IL-1) in the pathogenesis of systemic onset juvenile idiopathic arthritis and clinical response to IL-1 blockade A modular analysis framework for blood genomics studies: application to systemic lupus erythematosus Blood leukocyte microarrays to diagnose systemic onset juvenile idiopathic arthritis and follow the response to IL-1 blockade Sequence census methods for functional genomics A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome Direct multiplexed measurement of gene expression with color-coded probe pairs Interferon and granulopoiesis signatures in systemic lupus erythematosus blood Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus Role of interleukin 10 in the B lymphocyte hyperactivity and autoantibody production of human systemic lupus erythematosus Clinical and biologic eff ects of anti-interleukin-10 monoclonal antibody administration in systemic lupus erythematosus Up-regulated MHC-class II expression and gamma-IFN and soluble IL-2R in lupus nephritis IFN-gamma is essential for the development of autoimmune glomerulonephritis in MRL/Ipr mice Decreased production of interleukin-12 and other Th1-type cytokines in patients with recent-onset systemic lupus erythematosus Systemic lupus erythematosus: presence in human serum of an unusual acid-labile leukocyte interferon What do mouse models teach us about human SLE? Update on human systemic lupus erythematosus genetics Shared and unique gene expression in systemic lupus erythematosus depending on disease activity Longitudinal expression of type I interferon responsive genes in systemic lupus erythematosus Association of a gene expression profi le from whole blood with disease activity in systemic lupus erythaematosus Gene expression in systemic lupus erythematosus: bone marrow analysis diff erentiates active from inactive disease and reveals apoptosis and granulopoiesis signatures Interferon-regulated chemokines as biomarkers of systemic lupus erythematosus disease activity: a validation study Neutralization of interferonalpha/beta-inducible genes and downstream eff ect in a phase I trial of an anti-interferon-alpha monoclonal antibody in systemic lupus erythematosus How the study of children with rheumatic diseases identifi ed interferon-alpha and interleukin-1 as novel therapeutic targets Microarray-based identifi cation of novel biomarkers in IL-1-mediated diseases Gene expression profi le in multiple sclerosis patients and healthy controls: identifying pathways relevant to disease Blood transcriptional signatures of multiple sclerosis: unique gene expression of disease activity Pharmacogenomics of interferon-beta therapy in multiple sclerosis: baseline IFN signature determines pharmacological diff erences between patients Prediction of acute multiple sclerosis relapses by transcription levels of peripheral blood cells Zinc-ion binding and cytokine activity regulation pathways predicts outcome in relapsingremitting multiple sclerosis Distinct patterns of gene expression in the skin lesions of atopic dermatitis and psoriasis: a gene microarray analysis High expression levels of keratinocyte antimicrobial proteins in psoriasis compared with atopic dermatitis A distinct infl ammatory gene expression profi le in patients with psoriatic arthritis Gene expression profi ling of peripheral blood mononuclear leukocytes from psoriasis patients identifi es new immune regulatory molecules Microarray analyses of peripheral blood cells identifi es unique gene expression signature in psoriatic arthritis Microarray analysis of gene expression in lupus Analysis of gene expression profi les in human systemic lupus erythematosus using oligonucleotide microarray Role of interleukin-1 (IL-1) in the pathogenesis of systemic onset juvenile idiopathic arthritis and clinical response to IL-1 blockade Specifi c gene expression profi les in systemic juvenile idiopathic arthritis Gene expression profi ling of peripheral blood from patients with untreated new-onset systemic juvenile idiopathic arthritis reveals molecular heterogeneity that may predict macrophage activation syndrome Subtype-specifi c peripheral blood gene expression profi les in recent-onset juvenile idiopathic arthritis Impaired expression of peripheral blood apoptotic-related gene transcripts in acute multiple sclerosis relapse Gene expression changes in peripheral blood mononuclear cells from multiple sclerosis patients undergoing beta-interferon therapy Molecular profi le of peripheral blood mononuclear cells from patients with rheumatoid arthritis Rheumatoid arthritis subtypes identifi ed by genomic profi ling of peripheral blood cells: assignment of a type I interferon signature in a subpopulation of patients Gene profi ling in white blood cells predicts infl iximab responsiveness in rheumatoid arthritis Peripheral blood gene expression profi ling in rheumatoid arthritis Peripheral blood gene expression profi ling in Sjogren's syndrome Gene expression in peripheral blood mononuclear cells from children with diabetes Gene expression profi les in peripheral blood mononuclear cells refl ect the pathophysiology of type 2 diabetes Molecular classifi cation of Crohn's disease and ulcerative colitis patients using transcriptional profi les in peripheral blood mononuclear cells Interferonalpha/beta-mediated innate immune mechanisms in dermatomyositis Reed AM: An interferon signature in the peripheral blood of dermatomyositis patients is associated with disease activity Signatures of diff erentially regulated interferon gene expression and vasculotrophism in the peripheral blood cells of systemic sclerosis patients A macrophage marker, Siglec-1, is increased on circulating monocytes in patients with systemic sclerosis and induced by type I interferons and toll-like receptor agonists Leukocyte gene expression signatures in antineutrophil cytoplasmic autoantibody and lupus glomerulonephritis Gene-expression patterns predict phenotypes of immune-mediated thrombosis A genomic approach to human autoimmune diseases Innate immune recognition Toll-like receptors in the induction of the innate immune response Human macrophage activation programs induced by bacterial pathogens The plasticity of dendritic cell responses to pathogens and their components Unique gene expression profi les of human macrophages and dendritic cells to phylogenetically distinct parasites Genomic transcriptional profi ling identifi es a candidate blood biomarker signature for the diagnosis of septicemic melioidosis Gene-expression profi ling of peripheral blood mononuclear cells in sepsis Candidate biomarkers for discrimination between infection and disease caused by Mycobacterium tuberculosis Genomewide analysis of the host response to malaria in Kenyan children Malaria primes the innate immune response due to interferon-gamma induced enhancement of toll-like receptor expression and function Expression profi le of immune response genes in patients with severe acute respiratory syndrome Gene transcript abundance profi les distinguish Kawasaki disease from adenovirus infection Gene expression signatures diagnose infl uenza and other symptomatic respiratory viral infections in humans Epidemic Outbreak Surveillance (EOS): Surveillance of transcriptomes in basic military trainees with normal, febrile respiratory illness, and convalescent phenotypes Diff erences in global gene expression in peripheral blood mononuclear cells indicate a signifi cant role of the innate responses in progression of dengue fever but not dengue hemorrhagic fever Gene expression profi ling during early acute febrile stage of dengue infection can predict the disease outcome Transcriptional response in the peripheral blood of patients infected with Salmonella enterica serovar Typhi Enhanced monocyte response and decreased central memory T cells in children with invasive Staphylococcus aureus infections Genomics of Pediatric SIRS/ Septic Shock Investigators: Genomic expression profi ling across the pediatric systemic infl ammatory response syndrome, sepsis, and septic shock spectrum Gene profi ling in human blood leucocytes during recovery from septic shock Gene expression profi les diff erentiate between sterile SIRS and early sepsis Bloodbased transcriptomics: leukemias and beyond Gene expression profi ling of peripheral blood cells for early detection of breast cancer Systematic identifi cation and validation of candidate genes for detection of circulating tumor cells in peripheral blood specimens of colorectal cancer patients Multigene real-time PCR detection of circulating tumor cells in peripheral blood of lung cancer patients High-sensitivity array analysis of gene expression for the early detection of disseminated breast tumor cells in peripheral blood Using transcriptional profi ling to develop a diagnostic test of operational tolerance in liver transplant recipients Gene expression profi le analysis of the peripheral blood mononuclear cells from tolerant living-donor liver transplant recipients Identifi cation of a peripheral blood transcriptional biomarker panel associated with operational renal allograft tolerance NCE CECR Centre of Excellence for the Prevention of Organ Failure: Whole blood genomic biomarkers of acute cardiac allograft rejection Feasibility of diagnosing subclinical renal allograft rejection in children by whole blood gene expression analysis Microarray analysis of rejection in human kidney transplants using pathogenesis-based transcript sets Molecular heterogeneity in acute renal allograft rejection identifi ed by DNA microarray profi ling Early prognosis of the development of renal chronic allograft rejection by gene expression profi ling of human protocol biopsies Blood genomic responses diff er after stroke, seizures, hypoglycemia, and hypoxia: blood genomic fi ngerprints of disease Class A macrophage scavenger receptor gene expression levels in peripheral blood mononuclear cells specifi cally increase in patients with acute coronary syndrome Using peripheral blood mononuclear cells to determine a gene expression profi le of acute ischemic stroke: a pilot investigation Gene expression profi les in peripheral blood mononuclear cells of chronic heart Chaussabel et al Transcriptional profi ling of Alzheimer blood mononuclear cells by microarray Gene expression changes in blood as a putative biomarker for Huntington's disease Genome-wide expression profi ling of human blood reveals biomarkers for Huntington's disease Profi ling of genes expressed in peripheral blood mononuclear cells predicts glucocorticoid sensitivity in asthma patients Expression profi ling of genes related to asthma exacerbations Diesel exhaust inhalation and assessment of peripheral blood mononuclear cell gene transcription eff ects: an exploratory study of healthy human volunteers Changes in the peripheral blood transcriptome associated with occupational benzene exposure identifi ed by cross-comparison on two microarray platforms Blood gene expression signatures predict exposure levels Physical exercise-associated gene expression signatures in peripheral blood Eff ects of exercise on gene expression in human peripheral blood mononuclear cells Laughter up-regulates the genes related to NK cell activity in diabetes Data management: it starts at the bench Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria Genome-wide discovery of transcriptional modules from DNA sequence and gene expression Microarray data analysis: from disarray to consolidation and consensus Geometric interpretation of gene coexpression network analysis Controlling the false discovery rate: a practical and powerful approach to multiple testing How large a training set is needed to develop a classifi er for microarray data? A mixture model approach to sample size estimation in two-sample comparative microarray experiments False discovery rate, sensitivity and sample size for microarray studies Microarray experimental design: power and sample size considerations The functions of animal microRNAs MicroRNAs: genomics, biogenesis, mechanism, and function microRNA functions MicroRNA identifi cation in plasma and serum: a new tool to diagnose and monitor diseases Identifi cation of an evolutionarily conserved transcriptional signature of CD8 memory diff erentiation that is shared by T and B cells Prediction of graft-versus-host disease in humans by donor gene-expression profi ling Computational expression deconvolution in a complex mammalian organ Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations In silico microdissection of microarray data from heterogeneous cell populations Deconvolution of blood microarray data identifi es cellular activation patterns in systemic lupus erythematosus Biomarker discovery in heterogeneous tissue samplestaking the in-silico deconfounding approach Cell type-specifi c gene expression diff erences in complex tissues Yellow fever vaccine induces integrated multilineage and polyfunctional immune responses Systems biology approach predicts immunogenicity of the yellow fever vaccine in humans Assessing the human immune system through blood transcriptomics The work of the authors is supported by the Baylor Health Care System Foundation and the National Institutes of Health (U19 AIO57234-02, U01 AI082110, P01 CA084512).Published: 1 July 2010