key: cord-0290034-zrudz2bg authors: Razzaq, M.; Iglesias, M.; Ibrahim, M.; Goumidi, L.; Soukarieh, O.; Proust, C.; Roux, M.; Suchon, P.; Boland, A.; Daiain, D.; Olaso, R.; Butler, L.; Deleuze, J. F.; Odeberg, J.; Morange, P.-E.; Tregouet, D. A. title: An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism. date: 2020-10-06 journal: nan DOI: 10.1101/2020.10.05.20207001 sha: c507fcd64694957ec30cee31850576cb36b1483c doc_id: 290034 cord_uid: zrudz2bg Venous thromboembolism is the third common cardiovascular disease and is composed of two entities, deep vein thrombosis (DVT) and its fatal form, pulmonary embolism (PE). While PE is observed in ~40% of patients with documented DVT, there is limited biomarkers that can help identifying patients at high PE risk. To fill this need, we implemented a two hidden-layers artificial neural networks (ANN) on 376 antibodies and 19 biological traits measured in the plasma of 1388 DVT patients, with or without PE, of the MARTHA study. We used the LIME algorithm to obtain a linear approximate of the resulting ANN prediction model. As MARTHA patients were typed for genotyping DNA arrays, a genome wide association study (GWAS) was conducted on the LIME estimate. Detected single nucleotide polymorphisms (SNPs) were tested for association with PE risk in MARTHA. Main findings were replicated in the EOVT study composed of 143 PE patients and 196 DVT only patients. The derived ANN model for PE achieved an accuracy of 0.89 and 0.79 in our training and testing sets, respectively. A GWAS on the LIME approximate identified a strong statistical association peak (p = 5.3x10-7) at the PLXNA4 locus, with lead SNP rs1424597 at which the minor A allele was further shown to associate with an increased risk of PE (OR = 1.49 [1.12 - 1.98], p = 6.1x10-3). Further association analysis in EOVT revealed that, in the combined MARTHA and EOVT samples, the rs1424597-A allele was associated with increased PE risk (OR = 1.74 [1.27 - 2.38, p = 5.42x10-4) in patients over 37 years of age but not in younger patients (OR = 0.96 [0.65 - 1.41], p = 0.848). Using an original integrated proteomics and genetics strategy, we identified PLXNA4 as a new susceptibility gene for PE whose exact role now needs to be further elucidated. Using an original integrated proteomics and genetics strategy, we identified PLXNA4 as a new 58 susceptibility gene for PE whose exact role now needs to be further elucidated. Together with deep 64 vein thrombosis, pulmonary embolism forms the so-called venous thromboembolism, the 65 third most common cardiovascular disease, and its prevalence strongly increases with age. 66 While pulmonary embolism is observed in ~40% of patients with deep vein thrombosis, there 67 is currenly limited biomarkers that can help predicting which patients with deep vein 68 thrombosis are at risk of pulmonary embolism. We here deployed an Artificial Intelligence 69 based methodology integrating both plasma proteomics and genetics data to identify novel 70 biomarkers for PE. We thus identified the PLXNA4 gene as a novel molecular player involved 71 in the pathophysiology of pulmonary embolism. In particular, using two independent cohorts 72 totalling 1,881 patients with venous thromboembolism among which 467 experienced 73 pulmonary embolism, we identified a genetic polymorphism in the PLXNA4 gene that 74 associates with ~2 fold increased risk of pulmonary embolism in patients aged more than m a i s a n i d e a l p o t e n t i a l s o u r c e f o r V T E b i o m a r k e r s ; t h e i n t r a v a s c u l a r 99 c o m p a r t m e n t i t s e l f i s t h e s i t e o f d i s e a s e m a n i f e s t a t i o n a n d t e s t s a r e r e l a t i v e l y n o n -100 i n v a s i v e , q u i c k a n d c h e a p . S e v e r a l t y p e s o f m o l e c u l a r d e t e r m i n a n t s c a n b e a s s e s s e d i n 101 p l a s m a s a m p l e s i n c l u d i n g m i c r o R N A s , m e t a b o l i t e s a n d p r o t e i n s , a n d a l l o f t h e m h a v 109 r i s k , r i s k b e i n g c l a s s i f i e d b a s e d o n c l i n i c a l p r e s e n t a t i o n s a n d s y m p t o m s , w i t h p l a s m a 110 s a m p l e s p r o f i l e d b y m a t r i x -a s s i s t e d l a s e r d e s o r p t i o n / i o n i z a t i o n -t i m e -o f -f l i g h t / t i m e -o f -111 f l i g h t m a s s s p e c t r o m e t r y ( M A L D I -T O F / T O F M S ) . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207001 doi: medRxiv preprint I n t h i s w o r k , w e a i m a t i d e n t i f y i n g n o v e l m o l e c u l a r p h e n o t y p e s t h a t c o u l d h e l p i n 113 b e t t e r c h a r a c t e r i z i n g t h e b i o l o g i c a l m e c h a n i s m s i n v o l v e d i n t h e d e v e l o p m e n t o f P E i n 114 V T E p a t i e n t s . F o r t h i s , 2 3 4 p l a s m a p r o t e i n s t a r g e t e d w i t h 3 7 6 p r o t e i n s p e c i f i c 115 a n t i b o d i e s , w i t h t h e m a j o r p a r t d e r i v e d f r o m t h e H u m a n P r o t e i n A t l a s ( H P A ) r e p o s i t o r y 116 [ 1 3 ] w e r e p r o f i l e d i n 1 3 8 8 V T E p a t i e n t s s e l e c t e d f r o m t h e M A R T H A s t u d y [ 1 4 , 1 5 ] a n d 117 f r o m w h o m 2 8 3 h a d e x p e r i e n c e d a s y m p t o m a t i c P E e v e n t . T o e x p l o r e f a r b e y o n d t h e 118 s e a r c h f o r l i n e a r a s s o c i a t i o n s b e t w e e n p r o t e i n l e v e l s a n d P E r i s k a n d t o i d e n t i f y m o r r i s k w h i c h w o u l d , i n a d d i t i o n , h a v e a m o r e m e a n i n g f u l b i o l o g i c a l i n t e r p r e t a t i o n . A s 131 M A R T H A p a t i e n t s h a v e b e e n p r e v i o u s l y t y p e d f o r g e n o m e -w i d e g e n o t y p e d a t a , w e t h e n 132 c o n d u c t e d a g e n o m e w i d e a s s o c i a t i o n s t u d y o f t h e L I M E p r e d i c t o r o f P E i n o r d e r t o 133 d e t e c t s i n g l e n u c l e o t i d e p o l y m o r p h i s m s ( S N P s ) a s s o c i a t e d w i t h t h e p r e d i c t o r w i t h t h e 134 h o p e t h a t t h e i n t e g r a t i o n o f g e n e t i c a n d p r o t e o m i c d a t a c o u l d p r o v i d e a d d i t i o n a l i n s i g h t . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. 170 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. 193 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207001 doi: medRxiv preprint w a s 0 . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. G e n e t i c s o f t h e L I M E p r e d i c t o r T o g e t a d d i t i o n a l i n f o r m a t i o n a b o u t t h e b i o l o g i c a l 227 m e c h a n i s m s t h a t c o u l d u n d e r l y t h e l i n e a r L I M E p r e d i c t o r , w e c o n d u c . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207001 doi: medRxiv preprint e t o p a t i e n t s e l e c t i o n c r i t e r i a ( i . e a c c o r d i n g t o a g e . ) W e t h u s s p l i t t h e E O V T 253 s a m p l e s a c c o r d i n g t o t h e m e d i a n o f a g e o f V T E o n s e t , t h a t w a s 3 7 y r s . A s s h o w n i n T a b l e 254 4 , t h e p a t t e r n o f a s s o c i a t i o n o f r s 1 4 2 4 5 9 7 w i t h P E s l i g h t l y d i f f e r e d a c c o r d i n g t o a g . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. 0 t h a t o u r A N N / L I M E m o d e l s p o o r l y b e h a v e i n w o m e n u n d e r O C , w e n e v e r t h e l e s s s o u g h t 281 t o i n v e s t i g a t e w h e t h e r d i s c o r d a n t p r e d i c t i o n s c o u l d b e d u e t o g e n o m i c o u t l i e r 282 i n d i v i d u a l s h a r b o r i n g v e r y r a r e d i s e a s e c a u s i n g m u t a t i o n s t h a t c o u l d m a k e t h e g l o b a . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207001 doi: medRxiv preprint a n t i b o d y t a r g e t i n g P L X N A 4 w a s a v a i l a b l e w h e n t h e s c r e e n i n g p h a s e o f t h i s w o r k w a s 380 i n i t i a t e d p r e v e n t i n g u s f r o m v a l i d a t i n g f u r t h e r i t s a s s o c i a t i o n w i t h P E . S e c o n d , n o 381 p r o t e o m i c d a t a w a s a v a i l a b l e i n t h e E O V T s t u d y t o f o r m a l l y r e p l i c a t e t h e a s s o c i a t i o n o . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . 1 4 o f a n y w e l l c h a r a c t e r i z e d g e n e t i c r i s k f a c t o r s i n c l u d i n g a n t i t h r o m b i n , p r o t e i n C o r 413 p r o t e i n S d e f i c i e n c y , h o m o z y g o s i t y f o r F V L e i d e n o r F a c t o r I I 2 0 2 1 0 A , a n d l u p u s 414 a n t i c o a g u l a n t . D e t a i l e d d e s c r i p t i o n o f t h e M A R T H A p o p u l a t i o n h a s b e e n p r o v i d e . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. c o m p l e x e s w e r e c r o s s -l i n k e d b y r e s u s p e n d i n g t h e b e a d s i n 0 . 4 % P F A -P B S f o r 1 0 m i n . R -446 p h y c o e r y t h r i n -c o n j u g a t e d s t r e p t a v i d i n ( 1 : 7 5 0 , P B S -T ; I n v i t r o g e n ) w a s a d d e d t o a l l 447 s a m p l e s f o r 3 0 m i n f o l l o w e d b y 2 t i m e s w a s h e s . R e l a t i v e a m o u n t o f e a c h p r o t e i . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 6, 2020. w a s t h e n b a c k p r o p a g a t e d u s i n g a g r a d i e n t d e s c e n t a l g o r i t h m [ 5 2 ] ( w i t h l e a r n i n g o f 0 . 0 1 512 a n d b a t c h s i z e o f 3 2 ) t o u p d a t e w e i g h t s a c c o r d i n g t o t h e i r c o n t r i b u t i o n t o t h e e r r o r . I n 513 o r d e r t o r e d u c e o v e r -f i t t i n g a n d o b t a i n t h e b e s t p e r f o r m i n g m o d e l , t h e c a l l b a c k f e a t u r . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. . https://doi.org/10.1101/2020.10.05.20207001 doi: medRxiv preprint m o d e l b a s e d o n t h e i n v e r s e -v a r i a n c e w e i g h t i n g a n d h e t e r o g e n e i t y o f a s s o c i a t i o n 545 b e t w e e n t h e t w o s t u d i e s w a s a s s e s s e d b y t h e C o c h r a n -M a n t e l -H a e n s z e l t e s t s t a t i s t i c s n -546 M a n t e l -H a e n s z e l t e s t s t a t i s t i c [ 5 . Variant calling was 567 performed using the GATK HaplotypeCaller (GenomeAnalysisTK-v3.3-0, 568 https://software.broadinstitute.org/gatk/documentation/article.php?id=4148). Single-sample 569 gVCFs files were then aggregated using GATK CombineGVCFs and joint genotyping calling 570 performed by GATK GenotypeGVCFs. Recalibration was then conducted on the whole gVCF 571 following GATK guidelines. Following GATK VQSR, we retained single nucleotide variants 572 in the 99.5% tranche sensitivity threshold and indels in the 99% tranche sensitivity threshold 573 for further analysis and annotated them using Annovar [55] . 574 As a strategy to identify candidate variants that could explain the VTE phenotype in 575 individuals with discordant class prediction, we first prioritized variants that were likely 576 functional (stop loss/stop gain, frameshift, non-synonymous and splicing variants), located in 577 known VTE associated genes (ABO, ARID4A, C4BPB, EIF5A, F2, F3, F5, F8, F9, F13A1, 578 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. FGG, GRK5, MPHOSPH9, MAST2, NUGCC, OSMR, PLAT, PLCG2, PLEK1, PROC, 579 PROS1, SCARA5, SERPINC1, SLC44A2, STAB2, STX10, STXBP5, THBD, TSPAN15, 580 VWF) [56-58], that have not been reported or at a low frequency (<1‰) in public genomic 581 data repositories (dbSNP, GnomAD) and that was present in only one of the 200 sequenced 582 patients. If no candidate variants was identified in known VTE genes, we extended our search 583 to whole coding genes and also took into account the predicted deleteriousness of selected 584 candidates using in silico tools such as SIFT, PolyPhen and CADD-v1. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 6, 2020. The epidemiology of venous thromboembolism 602 2014 ESC guidelines on the diagnosis and management of acute pulmonary embolism Pulmonary embolism 605 or thrombosis in ARDS COVID-19 patients: A French monocenter retrospective study. 606 PLoS One Cannegieter 608 SC. Broadening the factor V Leiden paradox: pulmonary embolism and deep-vein 609 thrombosis as 2 sides of the spectrum A 612 Platelet Function Modulator of Thrombin Activation Is Causally Linked Cardiovascular Disease and Affects PAR4 Receptor Signaling 616 Association of recurrent venous thromboembolism and circulating microRNAs Bayesian 619 network analysis of plasma microRNA sequencing data in patients with venous 620 thrombosis PDGFB, a new candidate plasma biomarker for venous thromboembolism: results from 624 the VEREMA affinity proteomics study Discovery 627 of novel plasma biomarkers for future incident venous thromboembolism by untargeted 628 synchronous precursor selection mass spectrometry proteomics Plasma 631 Biomarkers and Identification of Resilient Metabolic Disruptions in Patients With 632 Venous Thromboembolism Using a Metabolic Systems Approach Metabolomic analysis of 92 pulmonary embolism patients from a nested case-control 637 study identifies metabolites associated with adverse clinical outcomes Identification of reduced circulating haptoglobin concentration as a biomarker of the 641 severity of pulmonary embolism: a nontargeted proteomic study Tissue-based map of the human proteome Genome 647 wide association study for plasma levels of natural anticoagulant inhibitors and protein C 648 anticoagulant pathway: the MARTHA project Meta-651 analysis of 65,734 individuals identifies TSPAN15 and SLC44A2 as two susceptibility 652 loci for venous thromboembolism Asymptotic properties of nearest neighbor rules using edited data Why should i trust you?" explaining the predictions of 657 any classifier Genomic 660 atlas of the human plasma proteome Connecting 663 genetic risk to disease end points through the human blood plasma proteome Visualizing data using t-SNE. ournal of machine learning research UMAP: Uniform manifold approximation and projection for 668 dimension reduction Analysis of 670 Body-wide Unfractionated Tissue Data to Identify a Core Human Endothelial 671 Risk factors for venous thromboembolism Brain age prediction using deep learning uncovers associated sequence 676 variants Increased risk of 678 venous thrombosis in carriers of hereditary protein C deficiency defect The spectrum of genetic defects 681 in a panel of 40 Dutch families with symptomatic protein C deficiency type I: 682 heterogeneity and founder effects Genomic HEXploring 684 allows landscaping of novel potential splicing regulatory elements Quantitative 687 evaluation of all hexamers as exonic splicing elements Large-690 scale comparative evaluation of user-friendly tools for predicting variant-induced 691 alterations of splicing regulatory elements Selective testing for thrombophilia in patients with first venous thrombosis: 694 results from a retrospective family cohort study on absolute thrombotic risk for currently 695 known thrombophilic defects in 2479 relatives 698 Systematic assessment of antibody selectivity in plasma based on a resource of 699 enrichment profiles Semaphorins in health and disease Common Variants in 705 PLXNA4 and Correlation to CSF-related Phenotypes in Alzheimer's Disease. Front 706 Neurosci PLXNA4 is associated 708 with Alzheimer disease and modulates tau phosphorylation Plexin-711 A4 negatively regulates T lymphocyte responses Plexin-A4-semaphorin 3A signaling is required for 714 Toll-like receptor-and sepsis-induced cytokine storm Semaphoring vascular morphogenesis Negative 719 regulation of platelet function by a secreted cell repulsive protein, semaphorin 3A A genome-722 wide analysis of the response to inhaled β 2-agonists in chronic obstructive pulmonary 723 disease Genome-wide association study of lung function decline in adults with and without 726 asthma A 728 genome-wide search for common SNP x SNP interactions on the risk of venous 729 thrombosis Meta-731 analysis of exome array data identifies six novel genetic loci for lung function Genetics of 734 venous thrombosis: insights from a new genome wide association study Highly multiplexed antibody suspension bead arrays 737 for plasma protein profiling PDGFB, a new candidate plasma biomarker for venous thromboembolism: results from 741 the VEREMA affinity proteomics study Common susceptibility alleles are unlikely to contribute as strongly as the FV and ABO 745 loci to VTE risk: results from a GWAS approach Adaptive synthetic sampling approach for imbalanced 748 learning Digital selection 750 and analogue amplification coexist in a cortex-inspired silicon circuit Probabilistic Interpretation of Feedforward Classification Network Outputs, 753 with Relationships to Statistical Pattern Recognition Curry HB. The method of steepest descent for non-linear minimization problems Statistical aspects of the analysis of data from retrospective 761 studies of disease Fast and accurate short read alignment with Burrows-Wheeler 763 transform ANNOVAR: functional annotation of genetic variants 765 from high-throughput sequencing data Whole-768 exome sequencing identifies rare variants in STAB2 associated with venous 769 thromboembolic disease Genomic and Transcriptomic Association Studies Identify 16 Novel Susceptibility 772 Loci for Venous Thromboembolism What is currently known about the genetics of venous 774 thromboembolism at the dawn of next generation sequencing technologies A general 777 framework for estimating the relative pathogenicity of human genetic variants Edited Nearest Neighbors Samples: 592 Features: 395 Classes: {DVT:497, PE:95} Samples: 576 Features: 395 Classes: {DVT:487, PE:89} Training data Samples: 16 Features: 395 Classes: {DVT:10, PE:6} explanation Classes: {DVT:497 Genome wide association study on LIME predictor Samples: 574 Classes: {DVT:481, PE:93} ANN framework Integration of genetics data Association of lead SNP on PE risk in MARTHA Classes: {DVT:1218, PE:324} Replication in EOVT Classes: {DVT:196