key: cord-103505-9adtbwp2 authors: Hale, A. T.; Zhou, D.; Bastarache, L.; Wang, L.; Zinkel, S. S.; Schiff, S. J.; Ko, D. C.; Gamazon, E. R. title: The genetic architecture of human infectious diseases and pathogen-induced cellular phenotypes date: 2020-07-21 journal: nan DOI: 10.1101/2020.07.19.20157404 sha: doc_id: 103505 cord_uid: 9adtbwp2 Infectious diseases (ID) represent a significant proportion of morbidity and mortality across the world. Host genetic variation is likely to contribute to ID risk and downstream clinical outcomes, but there is a need for a genetics-anchored framework to decipher molecular mechanisms of disease risk, infer causal effect on potential complications, and identify instruments for drug target discovery. Here we perform transcriptome-wide association studies (TWAS) of 35 clinical ID traits in a cohort of 23,294 individuals, identifying 70 gene-level associations with 26 ID traits. Replication in two large-scale biobanks provides additional support for the identified associations. A phenome-scale scan of the 70 gene-level associations across hematologic, respiratory, cardiovascular, and neurologic traits proposes a molecular basis for known complications of the ID traits. Using Mendelian Randomization, we then provide causal support for the effect of the ID traits on adverse outcomes. The rich resource of genetic information linked to serologic tests and pathogen cultures from bronchoalveolar lavage, sputum, sinus/nasopharyngeal, tracheal, and blood samples (up to 7,699 positive pathogen cultures across 92 unique genera) that we leverage provides a platform to interrogate the genetic basis of compartment-specific infection and colonization. To accelerate insights into cellular mechanisms, we develop a TWAS repository of gene-level associations in a broad collection of human tissues with 79 pathogen-exposure induced cellular phenotypes as a discovery and replication platform. Cellular phenotypes of infection by 8 pathogens included pathogen invasion, intercellular spread, cytokine production, and pyroptosis. These rich datasets will facilitate mechanistic insights into the role of host genetic variation on ID risk and pathophysiology, with important implications for our molecular understanding of potentially severe phenotypic outcomes. The genetic basis of infectious disease (ID) risk and severity has been relatively A schematic diagram illustrating our study design and the reference resource we provide 130 can be found in Figure The ID-associated genes tend to be less tissue-specific (i.e., more ubiquitously 212 expressed) than the remaining genes ( Figure S1A , Mann Whitney U test on the  statistic, p = 213 7.5x10 -4 ), possibly reflecting the multi-tissue PrediXcan approach we implemented, which 214 prioritizes genes with multi-tissue support to improve statistical power, but also the genes' 215 pleiotropic potential. We hypothesized that tissue expression profiling of ID-associated genes 216 can provide additional insights into disease etiologies and mechanisms. For example, the 217 intestinal infection associated gene NDUFA4 is expressed in a broad set of tissues, including 218 the alimentary canal, but displays relatively low expression in whole blood ( Figure S1B ). In 219 addition, TOR4A, the most significant association with bacterial pneumonia (Table 1) Table 5 ). These data identify specific molecular mechanisms across ID traits with critical 239 regulatory roles (e.g., protein modifications) in host response among the ID-associated genes. We tested the hypothesis that distinct infectious agents exploit common pathways to find 241 a compatible intracellular niche in the host, potentially implicating shared genetic risk factors. Notably, 64 of the 70 ID-associated genes ( Notably, we identified an enrichment (FDR = 9.68x10 -3 ) for a highly conserved motif Figure 7A ). In addition, we identified significantly more replicated SNP Our study provides a reference atlas of genetic variants and genetically-determined Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet 15, e1007889. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue Ojala The apoptotic v-cyclin-CDK6 complex phosphorylates and inactivates Bcl-2 Survival of tissue-resident memory T cells requires exogenous lipid 708 uptake and metabolism Conversion of p35 to p25 deregulates Cdk5 activity and promotes neurodegeneration Cdk5 Deletion Enhances the Anti-inflammatory Potential of GC-Mediated GR Activation During Inflammation. 714 Frontiers in Immunology 10 The human papillomavirus type 716 16 E7 gene encodes transactivation and transformation functions similar to those of adenovirus 717 E1A Microbial genome-wide association studies: 719 lessons from human GWAS Principal components analysis corrects for stratification in genome-wide association studies The ability to replicate in macrophages is conserved between 724 Yersinia pestis and Yersinia pseudotuberculosis Genome-wide methylation analysis and epigenetic unmasking 727 identify tumor suppressor genes in hepatocellular carcinoma Development of a large-scale de-identified DNA biobank to enable 731 personalized medicine Atg9a controls dsDNA-driven dynamic translocation of 734 STING and the innate immune response Quantitative proteomics reveals metabolic and pathogenic properties of Chlamydia 737 trachomatis developmental forms Human immune disorder arising 739 from mutation of the alpha chain of the interleukin-2 receptor Contrasting the Genetic Architecture of 30 Complex Traits from Summary Association Data Analysis of genome-wide association data highlights candidates for drug repositioning in 745 psychiatry Phosphoproteomics to Characterize Host Response During Influenza A Virus Infection of Human Macrophages Phosphoproteomic analyses reveal signaling pathways that facilitate lytic 752 gammaherpesvirus replication Gene set enrichment 755 analysis: a knowledge-based approach for interpreting genome-wide expression profiles Lactic Acidosis in Sepsis: It's Not All Anaerobic: 758 Implications for Diagnosis and Management Diagnostic value of MUC1 and EpCAM mRNA as tumor markers in differentiating benign from malignant pleural effusion Structure and regulation of the CDK5-p25(nck5a) complex Essential Function for the Nuclear Protein Akirin2 in B Cell Activation and Humoral 766 Immune Responses Akirin2 is critical for inducing 769 inflammatory genes by bridging IkappaB-zeta and the SWI/SNF complex Investigating the possible 773 causal association of smoking with depression and anxiety using Mendelian randomisation 774 meta-analysis: the CARTA consortium Subversion of the actin cytoskeleton 776 during viral infection Genome-wide association and HLA region fine-mapping studies identify susceptibility 779 loci for multiple common infections. Nat Commun 8, 599. known genetic associations and discovery of new genetic disorders Pathogen culture and virology data linked to whole-genome genetic information 1017 The SD consists of a wide range of clinical microbiological data. For individuals with 1018 whole-genome genetic information, we analyzed pathogen (bacterial, mycobacterial, and fungal) 1019 culture data derived from the following positive cultures for the indicated clinical samples 2) sputum (n = 2,478), 3) sinus/nasopharyngeal (n = 1,820), 4) bronchial-1021 alveolar lavage (n = 1,265), and 5) tracheal sampling (n = 422). Furthermore, we analyzed a 1022 respiratory panel containing 28 viral strains from 2,890 individuals with whole-genome genetic 1023 information. Viral strains included the following: 1) Adenovirus, 2) Bocavirus, 3) Bordetella 1024 parapertussis, 4) Bordetella pertussis, 5) Chlamydia pneumoniae, 6) Coronavirus Coronavirus HKU1, 8) Coronavirus NL63, 9) Coronavirus NOS, 10) Coronavirus OC43 12) Human Metapneumovirus, 13) Influenza A, 14) Influenza A, H1 16) Influenza A, H3, 17) Influenza B, 18) Mycoplasma pneumoniae 20) Parainfluenza 1, 21) Parainfluenza 2, 22) Parainfluenza 3, 23) Parainfluenza 1029 4, 24) Respiratory syncytial virus (RSV), 25) RSV, A, 26) RSV, B, and 27) Rhinovirus. The 1030 pathogen information for each individual in our study included: 1) Total number of cultures Number of ambiguous cultures (i.e., 1032 normal upper respiratory bacteria or low level contamination); 4) Number of positive cultures 1033 (i.e., the number of cultures with growth consistent with clinical infection); 5) Genus or genera 1034 isolated (up to 96 unique genera per sample site), which ranged from zero to 10 per sample. 1035 1036 METHODS DETAILS 1037 GWAS of ID traits (IVs) Only biallelic non-palindromic variants were considered as IVs MR-Egger 1091 regression generalizes the inverse-variance weighted method, where the intercept is assumed 1092 to be zero. We also used the weighted-median estimator High-throughput Human in vitrO Susceptibility Testing About/, and phenotype 1101 definitions and family-based GWAS of the Hi-HOST Phenome Project were previously 1102 described 2015) were obtained from the Coriell Institute. The LCLs represented 1104 diverse populations, including ESN (Esan in Nigeria LCLs were cultured in RPMI 1640 media containing 10% fetal bovine serum, 2 mM glutamine Salmonella 1110 infection was performed using pMMB67GFP (Pujol and Bliska, 2003), and sifA deletion was 1111 constructed using lambda red and validated using PCR correction for the total number of genes tested (n = 9,868) across 35 phenotypes (i.e., p < 1058 1.4x10 -7 ). Trait-specific significance was determined using Bonferroni correction for the total 1059 number of genes tested (n = 9,868, p < 5.07x10 -6 ). Genomic ancestry was quantified using