id sid tid token lemma pos 10_1101-436634 1 1 RegTools regtool NNS 10_1101-436634 1 2 : : : 10_1101-436634 1 3 Integrated integrate VBN 10_1101-436634 1 4 analysis analysis NN 10_1101-436634 1 5 of of IN 10_1101-436634 1 6 genomic genomic JJ 10_1101-436634 1 7 and and CC 10_1101-436634 1 8 transcriptomic transcriptomic JJ 10_1101-436634 1 9 data datum NNS 10_1101-436634 1 10 for for IN 10_1101-436634 1 11 the the DT 10_1101-436634 1 12 discovery discovery NN 10_1101-436634 1 13 of of IN 10_1101-436634 1 14 splicing splicing NN 10_1101-436634 1 15 variants variant NNS 10_1101-436634 1 16 in in IN 10_1101-436634 1 17 cancer cancer NN 10_1101-436634 1 18 1 1 CD 10_1101-436634 1 19 RegTools regtool NNS 10_1101-436634 1 20 : : : 10_1101-436634 1 21 Integrated integrate VBN 10_1101-436634 1 22 analysis analysis NN 10_1101-436634 1 23 of of IN 10_1101-436634 1 24 genomic genomic JJ 10_1101-436634 1 25 and and CC 10_1101-436634 1 26 transcriptomic transcriptomic JJ 10_1101-436634 1 27 data datum NNS 10_1101-436634 1 28 for for IN 10_1101-436634 1 29 the the DT 10_1101-436634 1 30 discovery discovery NN 10_1101-436634 1 31 of of IN 10_1101-436634 1 32 splicing splicing NN 10_1101-436634 1 33 variants variant NNS 10_1101-436634 1 34 in in IN 10_1101-436634 1 35 cancer cancer NN 10_1101-436634 1 36 Kelsy Kelsy NNP 10_1101-436634 1 37 C. C. NNP 10_1101-436634 1 38 Cotto1,2,† Cotto1,2,† NNP 10_1101-436634 1 39 , , , 10_1101-436634 1 40 Yang Yang NNP 10_1101-436634 1 41 - - HYPH 10_1101-436634 1 42 Yang Yang NNP 10_1101-436634 1 43 Feng2,† Feng2,† NNP 10_1101-436634 1 44 , , , 10_1101-436634 1 45 Avinash Avinash NNP 10_1101-436634 1 46 Ramu3 ramu3 NN 10_1101-436634 1 47 , , , 10_1101-436634 1 48 Zachary Zachary NNP 10_1101-436634 1 49 L. L. NNP 10_1101-436634 1 50 Skidmore1,2 skidmore1,2 CD 10_1101-436634 1 51 , , , 10_1101-436634 1 52 Jason Jason NNP 10_1101-436634 1 53 Kunisaki2 Kunisaki2 NNP 10_1101-436634 1 54 , , , 10_1101-436634 1 55 Megan Megan NNP 10_1101-436634 1 56 Richters1,2 richters1,2 CD 10_1101-436634 1 57 , , , 10_1101-436634 1 58 Sharon Sharon NNP 10_1101-436634 1 59 Freshour1,2 Freshour1,2 NNP 10_1101-436634 1 60 , , , 10_1101-436634 1 61 Yiing Yiing NNP 10_1101-436634 1 62 Lin4 Lin4 NNP 10_1101-436634 1 63 , , , 10_1101-436634 1 64 William William NNP 10_1101-436634 1 65 C. C. NNP 10_1101-436634 1 66 Chapman4 Chapman4 NNP 10_1101-436634 1 67 , , , 10_1101-436634 1 68 Ravindra Ravindra NNP 10_1101-436634 1 69 Uppaluri5,6 uppaluri5,6 CD 10_1101-436634 1 70 , , , 10_1101-436634 1 71 Ramaswamy Ramaswamy NNP 10_1101-436634 1 72 Govindan1,7 govindan1,7 CD 10_1101-436634 1 73 , , , 10_1101-436634 1 74 Obi Obi NNP 10_1101-436634 1 75 L. L. NNP 10_1101-436634 1 76 Griffith1,2,3,7 Griffith1,2,3,7 NNP 10_1101-436634 1 77 * * NFP 10_1101-436634 1 78 , , , 10_1101-436634 1 79 Malachi Malachi NNP 10_1101-436634 1 80 Griffith1,2,3,7 Griffith1,2,3,7 NNP 10_1101-436634 1 81 * * NFP 10_1101-436634 1 82 † † NN 10_1101-436634 1 83 denotes denote VBZ 10_1101-436634 1 84 co co JJ 10_1101-436634 1 85 - - JJ 10_1101-436634 1 86 first first JJ 10_1101-436634 1 87 authors author NNS 10_1101-436634 1 88 . . . 10_1101-436634 2 1 * * NFP 10_1101-436634 2 2 denotes denote VBZ 10_1101-436634 2 3 corresponding correspond VBG 10_1101-436634 2 4 authors author NNS 10_1101-436634 2 5 . . . 10_1101-436634 3 1 Correspondence correspondence NN 10_1101-436634 3 2 to to IN 10_1101-436634 3 3 Obi Obi NNP 10_1101-436634 3 4 L. L. NNP 10_1101-436634 3 5 Griffith Griffith NNP 10_1101-436634 3 6 ( ( -LRB- 10_1101-436634 3 7 obigriffith@wustl.edu obigriffith@wustl.edu NN 10_1101-436634 3 8 ) ) -RRB- 10_1101-436634 3 9 and and CC 10_1101-436634 3 10 Malachi Malachi NNP 10_1101-436634 3 11 Griffith Griffith NNP 10_1101-436634 3 12 ( ( -LRB- 10_1101-436634 3 13 mgriffit@wustl.edu mgriffit@wustl.edu NN 10_1101-436634 3 14 ) ) -RRB- 10_1101-436634 3 15 . . . 10_1101-436634 4 1 Affiliations affiliation NNS 10_1101-436634 4 2 : : : 10_1101-436634 4 3 1 1 CD 10_1101-436634 4 4 . . . 10_1101-436634 5 1 Division Division NNP 10_1101-436634 5 2 of of IN 10_1101-436634 5 3 Oncology Oncology NNP 10_1101-436634 5 4 , , , 10_1101-436634 5 5 Department Department NNP 10_1101-436634 5 6 of of IN 10_1101-436634 5 7 Medicine Medicine NNP 10_1101-436634 5 8 , , , 10_1101-436634 5 9 Washington Washington NNP 10_1101-436634 5 10 University University NNP 10_1101-436634 5 11 School School NNP 10_1101-436634 5 12 of of IN 10_1101-436634 5 13 Medicine Medicine NNP 10_1101-436634 5 14 , , , 10_1101-436634 5 15 St. St. NNP 10_1101-436634 5 16 Louis Louis NNP 10_1101-436634 5 17 , , , 10_1101-436634 5 18 MO MO NNP 10_1101-436634 5 19 , , , 10_1101-436634 5 20 USA USA NNP 10_1101-436634 5 21 2 2 CD 10_1101-436634 5 22 . . . 10_1101-436634 6 1 McDonnell McDonnell NNP 10_1101-436634 6 2 Genome Genome NNP 10_1101-436634 6 3 Institute Institute NNP 10_1101-436634 6 4 , , , 10_1101-436634 6 5 Washington Washington NNP 10_1101-436634 6 6 University University NNP 10_1101-436634 6 7 School School NNP 10_1101-436634 6 8 of of IN 10_1101-436634 6 9 Medicine Medicine NNP 10_1101-436634 6 10 , , , 10_1101-436634 6 11 St. St. NNP 10_1101-436634 6 12 Louis Louis NNP 10_1101-436634 6 13 , , , 10_1101-436634 6 14 MO MO NNP 10_1101-436634 6 15 , , , 10_1101-436634 6 16 USA USA NNP 10_1101-436634 6 17 3 3 CD 10_1101-436634 6 18 . . . 10_1101-436634 7 1 Department Department NNP 10_1101-436634 7 2 of of IN 10_1101-436634 7 3 Genetics Genetics NNP 10_1101-436634 7 4 , , , 10_1101-436634 7 5 Washington Washington NNP 10_1101-436634 7 6 University University NNP 10_1101-436634 7 7 School School NNP 10_1101-436634 7 8 of of IN 10_1101-436634 7 9 Medicine Medicine NNP 10_1101-436634 7 10 , , , 10_1101-436634 7 11 St. St. NNP 10_1101-436634 7 12 Louis Louis NNP 10_1101-436634 7 13 , , , 10_1101-436634 7 14 MO MO NNP 10_1101-436634 7 15 , , , 10_1101-436634 7 16 USA USA NNP 10_1101-436634 7 17 4 4 CD 10_1101-436634 7 18 . . . 10_1101-436634 8 1 Department Department NNP 10_1101-436634 8 2 of of IN 10_1101-436634 8 3 Surgery Surgery NNP 10_1101-436634 8 4 , , , 10_1101-436634 8 5 Washington Washington NNP 10_1101-436634 8 6 University University NNP 10_1101-436634 8 7 School School NNP 10_1101-436634 8 8 of of IN 10_1101-436634 8 9 Medicine Medicine NNP 10_1101-436634 8 10 , , , 10_1101-436634 8 11 St. St. NNP 10_1101-436634 8 12 Louis Louis NNP 10_1101-436634 8 13 , , , 10_1101-436634 8 14 MO MO NNP 10_1101-436634 8 15 , , , 10_1101-436634 8 16 USA USA NNP 10_1101-436634 8 17 5 5 CD 10_1101-436634 8 18 . . . 10_1101-436634 9 1 Department Department NNP 10_1101-436634 9 2 of of IN 10_1101-436634 9 3 Surgery Surgery NNP 10_1101-436634 9 4 , , , 10_1101-436634 9 5 Brigham Brigham NNP 10_1101-436634 9 6 and and CC 10_1101-436634 9 7 Women Women NNP 10_1101-436634 9 8 ’s ’s POS 10_1101-436634 9 9 Hospital Hospital NNP 10_1101-436634 9 10 , , , 10_1101-436634 9 11 Boston Boston NNP 10_1101-436634 9 12 , , , 10_1101-436634 9 13 MA MA NNP 10_1101-436634 9 14 , , , 10_1101-436634 9 15 USA USA NNP 10_1101-436634 9 16 6 6 CD 10_1101-436634 9 17 . . . 10_1101-436634 10 1 Department Department NNP 10_1101-436634 10 2 of of IN 10_1101-436634 10 3 Medical Medical NNP 10_1101-436634 10 4 Oncology Oncology NNP 10_1101-436634 10 5 , , , 10_1101-436634 10 6 Dana Dana NNP 10_1101-436634 10 7 - - HYPH 10_1101-436634 10 8 Farber Farber NNP 10_1101-436634 10 9 Cancer Cancer NNP 10_1101-436634 10 10 Institute Institute NNP 10_1101-436634 10 11 , , , 10_1101-436634 10 12 Boston Boston NNP 10_1101-436634 10 13 , , , 10_1101-436634 10 14 MA MA NNP 10_1101-436634 10 15 , , , 10_1101-436634 10 16 USA USA NNP 10_1101-436634 10 17 7 7 CD 10_1101-436634 10 18 . . . 10_1101-436634 11 1 Siteman Siteman NNP 10_1101-436634 11 2 Cancer Cancer NNP 10_1101-436634 11 3 Center Center NNP 10_1101-436634 11 4 , , , 10_1101-436634 11 5 Washington Washington NNP 10_1101-436634 11 6 University University NNP 10_1101-436634 11 7 School School NNP 10_1101-436634 11 8 of of IN 10_1101-436634 11 9 Medicine Medicine NNP 10_1101-436634 11 10 , , , 10_1101-436634 11 11 St. St. NNP 10_1101-436634 11 12 Louis Louis NNP 10_1101-436634 11 13 , , , 10_1101-436634 11 14 MO MO NNP 10_1101-436634 11 15 , , , 10_1101-436634 11 16 USA USA NNP 10_1101-436634 11 17 Abstract Abstract NNP 10_1101-436634 11 18 Somatic Somatic NNP 10_1101-436634 11 19 mutations mutation NNS 10_1101-436634 11 20 in in IN 10_1101-436634 11 21 non non JJ 10_1101-436634 11 22 - - JJ 10_1101-436634 11 23 coding coding JJ 10_1101-436634 11 24 regions region NNS 10_1101-436634 11 25 and and CC 10_1101-436634 11 26 even even RB 10_1101-436634 11 27 in in IN 10_1101-436634 11 28 exons exon NNS 10_1101-436634 11 29 may may MD 10_1101-436634 11 30 have have VB 10_1101-436634 11 31 unidentified unidentified JJ 10_1101-436634 11 32 regulatory regulatory JJ 10_1101-436634 11 33 consequences consequence NNS 10_1101-436634 11 34 which which WDT 10_1101-436634 11 35 are be VBP 10_1101-436634 11 36 often often RB 10_1101-436634 11 37 overlooked overlook VBN 10_1101-436634 11 38 in in IN 10_1101-436634 11 39 analysis analysis NN 10_1101-436634 11 40 workflows workflow NNS 10_1101-436634 11 41 . . . 10_1101-436634 12 1 Here here RB 10_1101-436634 12 2 we -PRON- PRP 10_1101-436634 12 3 present present VBP 10_1101-436634 12 4 RegTools RegTools NNP 10_1101-436634 12 5 ( ( -LRB- 10_1101-436634 12 6 www.regtools.org www.regtools.org NNP 10_1101-436634 12 7 ) ) -RRB- 10_1101-436634 12 8 , , , 10_1101-436634 12 9 a a DT 10_1101-436634 12 10 free free JJ 10_1101-436634 12 11 , , , 10_1101-436634 12 12 open open JJ 10_1101-436634 12 13 - - HYPH 10_1101-436634 12 14 source source NN 10_1101-436634 12 15 software software NN 10_1101-436634 12 16 package package NN 10_1101-436634 12 17 designed design VBN 10_1101-436634 12 18 to to TO 10_1101-436634 12 19 integrate integrate VB 10_1101-436634 12 20 analysis analysis NN 10_1101-436634 12 21 of of IN 10_1101-436634 12 22 somatic somatic JJ 10_1101-436634 12 23 variants variant NNS 10_1101-436634 12 24 from from IN 10_1101-436634 12 25 genomic genomic JJ 10_1101-436634 12 26 data datum NNS 10_1101-436634 12 27 with with IN 10_1101-436634 12 28 splice splice NN 10_1101-436634 12 29 junctions junction NNS 10_1101-436634 12 30 from from IN 10_1101-436634 12 31 transcriptomic transcriptomic JJ 10_1101-436634 12 32 data datum NNS 10_1101-436634 12 33 to to TO 10_1101-436634 12 34 identify identify VB 10_1101-436634 12 35 variants variant NNS 10_1101-436634 12 36 that that WDT 10_1101-436634 12 37 may may MD 10_1101-436634 12 38 cause cause VB 10_1101-436634 12 39 aberrant aberrant JJ 10_1101-436634 12 40 splicing splicing NN 10_1101-436634 12 41 . . . 10_1101-436634 13 1 RegTools RegTools NNP 10_1101-436634 13 2 was be VBD 10_1101-436634 13 3 applied apply VBN 10_1101-436634 13 4 to to IN 10_1101-436634 13 5 over over IN 10_1101-436634 13 6 9,000 9,000 CD 10_1101-436634 13 7 tumor tumor NN 10_1101-436634 13 8 samples sample NNS 10_1101-436634 13 9 with with IN 10_1101-436634 13 10 both both DT 10_1101-436634 13 11 tumor tumor NN 10_1101-436634 13 12 DNA DNA NNP 10_1101-436634 13 13 and and CC 10_1101-436634 13 14 RNA RNA NNP 10_1101-436634 13 15 sequence sequence NN 10_1101-436634 13 16 data datum NNS 10_1101-436634 13 17 . . . 10_1101-436634 14 1 We -PRON- PRP 10_1101-436634 14 2 discovered discover VBD 10_1101-436634 14 3 235,778 235,778 CD 10_1101-436634 14 4 events event NNS 10_1101-436634 14 5 where where WRB 10_1101-436634 14 6 a a DT 10_1101-436634 14 7 variant variant JJ 10_1101-436634 14 8 significantly significantly RB 10_1101-436634 14 9 increased increase VBD 10_1101-436634 14 10 the the DT 10_1101-436634 14 11 splicing splicing NN 10_1101-436634 14 12 of of IN 10_1101-436634 14 13 a a DT 10_1101-436634 14 14 particular particular JJ 10_1101-436634 14 15 junction junction NN 10_1101-436634 14 16 , , , 10_1101-436634 14 17 across across IN 10_1101-436634 14 18 158,200 158,200 CD 10_1101-436634 14 19 unique unique JJ 10_1101-436634 14 20 variants variant NNS 10_1101-436634 14 21 and and CC 10_1101-436634 14 22 131,212 131,212 CD 10_1101-436634 14 23 unique unique JJ 10_1101-436634 14 24 junctions junction NNS 10_1101-436634 14 25 . . . 10_1101-436634 15 1 To to TO 10_1101-436634 15 2 characterize characterize VB 10_1101-436634 15 3 these these DT 10_1101-436634 15 4 somatic somatic JJ 10_1101-436634 15 5 variants variant NNS 10_1101-436634 15 6 and and CC 10_1101-436634 15 7 their -PRON- PRP$ 10_1101-436634 15 8 associated associate VBN 10_1101-436634 15 9 splice splice NN 10_1101-436634 15 10 isoforms isoform NNS 10_1101-436634 15 11 , , , 10_1101-436634 15 12 we -PRON- PRP 10_1101-436634 15 13 annotated annotate VBD 10_1101-436634 15 14 them -PRON- PRP 10_1101-436634 15 15 with with IN 10_1101-436634 15 16 the the DT 10_1101-436634 15 17 Variant Variant NNP 10_1101-436634 15 18 Effect Effect NNP 10_1101-436634 15 19 Predictor Predictor NNP 10_1101-436634 15 20 ( ( -LRB- 10_1101-436634 15 21 VEP VEP NNP 10_1101-436634 15 22 ) ) -RRB- 10_1101-436634 15 23 , , , 10_1101-436634 15 24 SpliceAI SpliceAI NNP 10_1101-436634 15 25 , , , 10_1101-436634 15 26 and and CC 10_1101-436634 15 27 Genotype- Genotype- NNP 10_1101-436634 15 28 Tissue Tissue NNP 10_1101-436634 15 29 Expression Expression NNP 10_1101-436634 15 30 ( ( -LRB- 10_1101-436634 15 31 GTEx GTEx NNP 10_1101-436634 15 32 ) ) -RRB- 10_1101-436634 15 33 junction junction NN 10_1101-436634 15 34 counts count NNS 10_1101-436634 15 35 and and CC 10_1101-436634 15 36 compared compare VBD 10_1101-436634 15 37 our -PRON- PRP$ 10_1101-436634 15 38 results result NNS 10_1101-436634 15 39 to to IN 10_1101-436634 15 40 other other JJ 10_1101-436634 15 41 tools tool NNS 10_1101-436634 15 42 that that WDT 10_1101-436634 15 43 integrate integrate VBP 10_1101-436634 15 44 genomic genomic JJ 10_1101-436634 15 45 and and CC 10_1101-436634 15 46 transcriptomic transcriptomic JJ 10_1101-436634 15 47 data datum NNS 10_1101-436634 15 48 . . . 10_1101-436634 16 1 While while IN 10_1101-436634 16 2 certain certain JJ 10_1101-436634 16 3 events event NNS 10_1101-436634 16 4 can can MD 10_1101-436634 16 5 be be VB 10_1101-436634 16 6 identified identify VBN 10_1101-436634 16 7 by by IN 10_1101-436634 16 8 the the DT 10_1101-436634 16 9 aforementioned aforementioned JJ 10_1101-436634 16 10 tools tool NNS 10_1101-436634 16 11 , , , 10_1101-436634 16 12 the the DT 10_1101-436634 16 13 unbiased unbiased JJ 10_1101-436634 16 14 nature nature NN 10_1101-436634 16 15 of of IN 10_1101-436634 16 16 RegTools RegTools NNP 10_1101-436634 16 17 has have VBZ 10_1101-436634 16 18 allowed allow VBN 10_1101-436634 16 19 us -PRON- PRP 10_1101-436634 16 20 to to TO 10_1101-436634 16 21 identify identify VB 10_1101-436634 16 22 novel novel JJ 10_1101-436634 16 23 splice splice NN 10_1101-436634 16 24 variants variant NNS 10_1101-436634 16 25 and and CC 10_1101-436634 16 26 previously previously RB 10_1101-436634 16 27 unreported unreported JJ 10_1101-436634 16 28 patterns pattern NNS 10_1101-436634 16 29 of of IN 10_1101-436634 16 30 splicing splicing NN 10_1101-436634 16 31 disruption disruption NN 10_1101-436634 16 32 in in IN 10_1101-436634 16 33 known know VBN 10_1101-436634 16 34 cancer cancer NN 10_1101-436634 16 35 drivers driver NNS 10_1101-436634 16 36 , , , 10_1101-436634 16 37 such such JJ 10_1101-436634 16 38 as as IN 10_1101-436634 16 39 TP53 TP53 NNP 10_1101-436634 16 40 , , , 10_1101-436634 16 41 CDKN2A CDKN2A NNP 10_1101-436634 16 42 , , , 10_1101-436634 16 43 and and CC 10_1101-436634 16 44 B2 B2 NNP 10_1101-436634 16 45 M M NNP 10_1101-436634 16 46 , , , 10_1101-436634 16 47 as as RB 10_1101-436634 16 48 well well RB 10_1101-436634 16 49 as as IN 10_1101-436634 16 50 in in IN 10_1101-436634 16 51 genes gene NNS 10_1101-436634 16 52 not not RB 10_1101-436634 16 53 previously previously RB 10_1101-436634 16 54 considered consider VBN 10_1101-436634 16 55 cancer cancer NN 10_1101-436634 16 56 - - HYPH 10_1101-436634 16 57 relevant relevant JJ 10_1101-436634 16 58 , , , 10_1101-436634 16 59 such such JJ 10_1101-436634 16 60 as as IN 10_1101-436634 16 61 RNF145 RNF145 NNP 10_1101-436634 16 62 . . . 10_1101-436634 17 1 Introduction introduction NN 10_1101-436634 17 2 .CC .CC NFP 10_1101-436634 17 3 - - : 10_1101-436634 17 4 BY by IN 10_1101-436634 17 5 - - HYPH 10_1101-436634 17 6 NC NC NNP 10_1101-436634 17 7 - - HYPH 10_1101-436634 17 8 ND ND NNP 10_1101-436634 17 9 4.0 4.0 CD 10_1101-436634 17 10 International international JJ 10_1101-436634 17 11 licensea licensea NNS 10_1101-436634 17 12 certified certify VBN 10_1101-436634 17 13 by by IN 10_1101-436634 17 14 peer peer NN 10_1101-436634 17 15 review review NN 10_1101-436634 17 16 ) ) -RRB- 10_1101-436634 17 17 is be VBZ 10_1101-436634 17 18 the the DT 10_1101-436634 17 19 author author NN 10_1101-436634 17 20 / / SYM 10_1101-436634 17 21 funder funder NN 10_1101-436634 17 22 , , , 10_1101-436634 17 23 who who WP 10_1101-436634 17 24 has have VBZ 10_1101-436634 17 25 granted grant VBN 10_1101-436634 17 26 bioRxiv biorxiv IN 10_1101-436634 17 27 a a DT 10_1101-436634 17 28 license license NN 10_1101-436634 17 29 to to TO 10_1101-436634 17 30 display display VB 10_1101-436634 17 31 the the DT 10_1101-436634 17 32 preprint preprint NN 10_1101-436634 17 33 in in IN 10_1101-436634 17 34 perpetuity perpetuity NN 10_1101-436634 17 35 . . . 10_1101-436634 18 1 It -PRON- PRP 10_1101-436634 18 2 is be VBZ 10_1101-436634 18 3 made make VBN 10_1101-436634 18 4 available available JJ 10_1101-436634 18 5 under under IN 10_1101-436634 18 6 The the DT 10_1101-436634 18 7 copyright copyright NN 10_1101-436634 18 8 holder holder NN 10_1101-436634 18 9 for for IN 10_1101-436634 18 10 this this DT 10_1101-436634 18 11 preprint preprint NN 10_1101-436634 18 12 ( ( -LRB- 10_1101-436634 18 13 which which WDT 10_1101-436634 18 14 was be VBD 10_1101-436634 18 15 notthis notthis DT 10_1101-436634 18 16 version version NN 10_1101-436634 18 17 posted post VBN 10_1101-436634 18 18 January January NNP 10_1101-436634 18 19 5 5 CD 10_1101-436634 18 20 , , , 10_1101-436634 18 21 2021 2021 CD 10_1101-436634 18 22 . . . 10_1101-436634 18 23 ; ; : 10_1101-436634 18 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 18 25 : : : 10_1101-436634 18 26 bioRxiv biorxiv VB 10_1101-436634 18 27 preprint preprint NN 10_1101-436634 18 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 18 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB 10_1101-436634 18 30 2 2 CD 10_1101-436634 18 31 Alternative alternative JJ 10_1101-436634 18 32 splicing splicing NN 10_1101-436634 18 33 of of IN 10_1101-436634 18 34 messenger messenger NN 10_1101-436634 18 35 RNA RNA NNP 10_1101-436634 18 36 allows allow VBZ 10_1101-436634 18 37 a a DT 10_1101-436634 18 38 single single JJ 10_1101-436634 18 39 gene gene NN 10_1101-436634 18 40 to to TO 10_1101-436634 18 41 encode encode VB 10_1101-436634 18 42 multiple multiple JJ 10_1101-436634 18 43 gene gene NN 10_1101-436634 18 44 products product NNS 10_1101-436634 18 45 , , , 10_1101-436634 18 46 increasing increase VBG 10_1101-436634 18 47 a a DT 10_1101-436634 18 48 cell cell NN 10_1101-436634 18 49 ’s ’s POS 10_1101-436634 18 50 functional functional JJ 10_1101-436634 18 51 diversity diversity NN 10_1101-436634 18 52 and and CC 10_1101-436634 18 53 regulatory regulatory JJ 10_1101-436634 18 54 precision precision NN 10_1101-436634 18 55 . . . 10_1101-436634 19 1 However however RB 10_1101-436634 19 2 , , , 10_1101-436634 19 3 splicing splicing NN 10_1101-436634 19 4 malfunction malfunction NN 10_1101-436634 19 5 can can MD 10_1101-436634 19 6 lead lead VB 10_1101-436634 19 7 to to IN 10_1101-436634 19 8 imbalances imbalance NNS 10_1101-436634 19 9 in in IN 10_1101-436634 19 10 transcriptional transcriptional JJ 10_1101-436634 19 11 output output NN 10_1101-436634 19 12 or or CC 10_1101-436634 19 13 even even RB 10_1101-436634 19 14 the the DT 10_1101-436634 19 15 presence presence NN 10_1101-436634 19 16 of of IN 10_1101-436634 19 17 novel novel JJ 10_1101-436634 19 18 oncogenic oncogenic JJ 10_1101-436634 19 19 transcripts1 transcripts1 NN 10_1101-436634 19 20 . . . 10_1101-436634 20 1 The the DT 10_1101-436634 20 2 interpretation interpretation NN 10_1101-436634 20 3 of of IN 10_1101-436634 20 4 variants variant NNS 10_1101-436634 20 5 in in IN 10_1101-436634 20 6 cancer cancer NN 10_1101-436634 20 7 is be VBZ 10_1101-436634 20 8 frequently frequently RB 10_1101-436634 20 9 focused focus VBN 10_1101-436634 20 10 on on IN 10_1101-436634 20 11 direct direct JJ 10_1101-436634 20 12 protein- protein- JJ 10_1101-436634 20 13 coding code VBG 10_1101-436634 20 14 alterations2 alterations2 NNP 10_1101-436634 20 15 . . . 10_1101-436634 21 1 However however RB 10_1101-436634 21 2 , , , 10_1101-436634 21 3 most most JJS 10_1101-436634 21 4 somatic somatic JJ 10_1101-436634 21 5 mutations mutation NNS 10_1101-436634 21 6 arise arise VBP 10_1101-436634 21 7 in in IN 10_1101-436634 21 8 intronic intronic JJ 10_1101-436634 21 9 and and CC 10_1101-436634 21 10 intergenic intergenic JJ 10_1101-436634 21 11 regions region NNS 10_1101-436634 21 12 , , , 10_1101-436634 21 13 and and CC 10_1101-436634 21 14 exonic exonic JJ 10_1101-436634 21 15 mutations mutation NNS 10_1101-436634 21 16 may may MD 10_1101-436634 21 17 also also RB 10_1101-436634 21 18 have have VB 10_1101-436634 21 19 unidentified unidentified JJ 10_1101-436634 21 20 regulatory regulatory JJ 10_1101-436634 21 21 consequences3,4,5,6 consequences3,4,5,6 NN 10_1101-436634 21 22 . . . 10_1101-436634 22 1 For for IN 10_1101-436634 22 2 example example NN 10_1101-436634 22 3 , , , 10_1101-436634 22 4 mutations mutation NNS 10_1101-436634 22 5 can can MD 10_1101-436634 22 6 affect affect VB 10_1101-436634 22 7 splicing splicing NN 10_1101-436634 22 8 either either CC 10_1101-436634 22 9 in in IN 10_1101-436634 22 10 trans tran NNS 10_1101-436634 22 11 , , , 10_1101-436634 22 12 by by IN 10_1101-436634 22 13 acting act VBG 10_1101-436634 22 14 on on IN 10_1101-436634 22 15 splicing splicing NN 10_1101-436634 22 16 effectors effector NNS 10_1101-436634 22 17 , , , 10_1101-436634 22 18 or or CC 10_1101-436634 22 19 in in IN 10_1101-436634 22 20 cis cis NN 10_1101-436634 22 21 , , , 10_1101-436634 22 22 by by IN 10_1101-436634 22 23 altering alter VBG 10_1101-436634 22 24 the the DT 10_1101-436634 22 25 splicing splicing NN 10_1101-436634 22 26 signals signal NNS 10_1101-436634 22 27 located locate VBN 10_1101-436634 22 28 on on IN 10_1101-436634 22 29 the the DT 10_1101-436634 22 30 transcripts transcript NNS 10_1101-436634 22 31 themselves7 themselves7 NN 10_1101-436634 22 32 . . . 10_1101-436634 23 1 Increasingly increasingly RB 10_1101-436634 23 2 , , , 10_1101-436634 23 3 we -PRON- PRP 10_1101-436634 23 4 are be VBP 10_1101-436634 23 5 identifying identify VBG 10_1101-436634 23 6 the the DT 10_1101-436634 23 7 importance importance NN 10_1101-436634 23 8 of of IN 10_1101-436634 23 9 splice splice NN 10_1101-436634 23 10 variants variant NNS 10_1101-436634 23 11 in in IN 10_1101-436634 23 12 disease disease NN 10_1101-436634 23 13 processes process NNS 10_1101-436634 23 14 , , , 10_1101-436634 23 15 including include VBG 10_1101-436634 23 16 in in IN 10_1101-436634 23 17 cancer8,9 cancer8,9 NN 10_1101-436634 23 18 . . . 10_1101-436634 24 1 However however RB 10_1101-436634 24 2 , , , 10_1101-436634 24 3 our -PRON- PRP$ 10_1101-436634 24 4 understanding understanding NN 10_1101-436634 24 5 of of IN 10_1101-436634 24 6 the the DT 10_1101-436634 24 7 landscape landscape NN 10_1101-436634 24 8 of of IN 10_1101-436634 24 9 these these DT 10_1101-436634 24 10 variants variant NNS 10_1101-436634 24 11 is be VBZ 10_1101-436634 24 12 currently currently RB 10_1101-436634 24 13 limited limit VBN 10_1101-436634 24 14 , , , 10_1101-436634 24 15 and and CC 10_1101-436634 24 16 few few JJ 10_1101-436634 24 17 tools tool NNS 10_1101-436634 24 18 exist exist VBP 10_1101-436634 24 19 for for IN 10_1101-436634 24 20 their -PRON- PRP$ 10_1101-436634 24 21 discovery discovery NN 10_1101-436634 24 22 . . . 10_1101-436634 25 1 One one CD 10_1101-436634 25 2 approach approach NN 10_1101-436634 25 3 to to IN 10_1101-436634 25 4 elucidating elucidate VBG 10_1101-436634 25 5 the the DT 10_1101-436634 25 6 role role NN 10_1101-436634 25 7 of of IN 10_1101-436634 25 8 splice splice NN 10_1101-436634 25 9 variants variant NNS 10_1101-436634 25 10 has have VBZ 10_1101-436634 25 11 been be VBN 10_1101-436634 25 12 to to TO 10_1101-436634 25 13 predict predict VB 10_1101-436634 25 14 the the DT 10_1101-436634 25 15 strength strength NN 10_1101-436634 25 16 of of IN 10_1101-436634 25 17 putative putative JJ 10_1101-436634 25 18 splice splice NN 10_1101-436634 25 19 sites site NNS 10_1101-436634 25 20 in in IN 10_1101-436634 25 21 pre pre NN 10_1101-436634 25 22 - - NN 10_1101-436634 25 23 mRNA mRNA NNP 10_1101-436634 25 24 from from IN 10_1101-436634 25 25 genomic genomic JJ 10_1101-436634 25 26 sequences sequence NNS 10_1101-436634 25 27 , , , 10_1101-436634 25 28 such such JJ 10_1101-436634 25 29 as as IN 10_1101-436634 25 30 the the DT 10_1101-436634 25 31 method method NN 10_1101-436634 25 32 used use VBN 10_1101-436634 25 33 by by IN 10_1101-436634 25 34 the the DT 10_1101-436634 25 35 SpliceAI SpliceAI NNP 10_1101-436634 25 36 tool10–13 tool10–13 NN 10_1101-436634 25 37 . . . 10_1101-436634 26 1 With with IN 10_1101-436634 26 2 the the DT 10_1101-436634 26 3 advent advent NN 10_1101-436634 26 4 of of IN 10_1101-436634 26 5 efficient efficient JJ 10_1101-436634 26 6 and and CC 10_1101-436634 26 7 affordable affordable JJ 10_1101-436634 26 8 RNA RNA NNP 10_1101-436634 26 9 - - HYPH 10_1101-436634 26 10 seq seq NN 10_1101-436634 26 11 , , , 10_1101-436634 26 12 we -PRON- PRP 10_1101-436634 26 13 are be VBP 10_1101-436634 26 14 also also RB 10_1101-436634 26 15 seeing see VBG 10_1101-436634 26 16 the the DT 10_1101-436634 26 17 complementary complementary JJ 10_1101-436634 26 18 approach approach NN 10_1101-436634 26 19 of of IN 10_1101-436634 26 20 evaluating evaluate VBG 10_1101-436634 26 21 alternative alternative JJ 10_1101-436634 26 22 splicing splicing NN 10_1101-436634 26 23 events event NNS 10_1101-436634 26 24 ( ( -LRB- 10_1101-436634 26 25 ASEs ASEs NNP 10_1101-436634 26 26 ) ) -RRB- 10_1101-436634 26 27 directly directly RB 10_1101-436634 26 28 from from IN 10_1101-436634 26 29 RNA RNA NNP 10_1101-436634 26 30 sequencing sequencing NN 10_1101-436634 26 31 data datum NNS 10_1101-436634 26 32 . . . 10_1101-436634 27 1 Various various JJ 10_1101-436634 27 2 tools tool NNS 10_1101-436634 27 3 exist exist VBP 10_1101-436634 27 4 which which WDT 10_1101-436634 27 5 allow allow VBP 10_1101-436634 27 6 the the DT 10_1101-436634 27 7 identification identification NN 10_1101-436634 27 8 of of IN 10_1101-436634 27 9 significant significant JJ 10_1101-436634 27 10 ASEs as NNS 10_1101-436634 27 11 from from IN 10_1101-436634 27 12 transcript transcript JJ 10_1101-436634 27 13 - - HYPH 10_1101-436634 27 14 level level NN 10_1101-436634 27 15 data datum NNS 10_1101-436634 27 16 within within IN 10_1101-436634 27 17 sample sample NN 10_1101-436634 27 18 cohorts cohort NNS 10_1101-436634 27 19 , , , 10_1101-436634 27 20 including include VBG 10_1101-436634 27 21 SUPPA2 suppa2 ADD 10_1101-436634 27 22 and and CC 10_1101-436634 27 23 SPLADDER14,15 SPLADDER14,15 NNP 10_1101-436634 27 24 . . . 10_1101-436634 28 1 Many many JJ 10_1101-436634 28 2 of of IN 10_1101-436634 28 3 these these DT 10_1101-436634 28 4 tools tool NNS 10_1101-436634 28 5 have have VBP 10_1101-436634 28 6 also also RB 10_1101-436634 28 7 evaluated evaluate VBN 10_1101-436634 28 8 the the DT 10_1101-436634 28 9 role role NN 10_1101-436634 28 10 of of IN 10_1101-436634 28 11 trans trans JJ 10_1101-436634 28 12 - - JJ 10_1101-436634 28 13 acting act VBG 10_1101-436634 28 14 splice splice NN 10_1101-436634 28 15 mutations16 mutations16 NN 10_1101-436634 28 16 . . . 10_1101-436634 29 1 However however RB 10_1101-436634 29 2 , , , 10_1101-436634 29 3 few few JJ 10_1101-436634 29 4 tools tool NNS 10_1101-436634 29 5 are be VBP 10_1101-436634 29 6 directed direct VBN 10_1101-436634 29 7 at at IN 10_1101-436634 29 8 linking link VBG 10_1101-436634 29 9 specific specific JJ 10_1101-436634 29 10 aberrant aberrant NN 10_1101-436634 29 11 RNA RNA NNP 10_1101-436634 29 12 splicing splice VBG 10_1101-436634 29 13 events event NNS 10_1101-436634 29 14 to to IN 10_1101-436634 29 15 specific specific JJ 10_1101-436634 29 16 genomic genomic JJ 10_1101-436634 29 17 variants variant NNS 10_1101-436634 29 18 in in IN 10_1101-436634 29 19 cis cis NNP 10_1101-436634 29 20 to to TO 10_1101-436634 29 21 investigate investigate VB 10_1101-436634 29 22 the the DT 10_1101-436634 29 23 splice splice NN 10_1101-436634 29 24 regulatory regulatory JJ 10_1101-436634 29 25 impact impact NN 10_1101-436634 29 26 of of IN 10_1101-436634 29 27 these these DT 10_1101-436634 29 28 variants variant NNS 10_1101-436634 29 29 . . . 10_1101-436634 30 1 Those those DT 10_1101-436634 30 2 few few JJ 10_1101-436634 30 3 relevant relevant JJ 10_1101-436634 30 4 tools tool NNS 10_1101-436634 30 5 that that WDT 10_1101-436634 30 6 do do VBP 10_1101-436634 30 7 exist exist VB 10_1101-436634 30 8 have have VB 10_1101-436634 30 9 significant significant JJ 10_1101-436634 30 10 limitations limitation NNS 10_1101-436634 30 11 that that WDT 10_1101-436634 30 12 preclude preclude VBP 10_1101-436634 30 13 them -PRON- PRP 10_1101-436634 30 14 from from IN 10_1101-436634 30 15 broad broad JJ 10_1101-436634 30 16 applications application NNS 10_1101-436634 30 17 . . . 10_1101-436634 31 1 The the DT 10_1101-436634 31 2 sQTL sqtl NN 10_1101-436634 31 3 - - HYPH 10_1101-436634 31 4 based base VBN 10_1101-436634 31 5 approach approach NN 10_1101-436634 31 6 taken take VBN 10_1101-436634 31 7 by by IN 10_1101-436634 31 8 LeafCutter LeafCutter NNP 10_1101-436634 31 9 and and CC 10_1101-436634 31 10 other other JJ 10_1101-436634 31 11 tools tool NNS 10_1101-436634 31 12 is be VBZ 10_1101-436634 31 13 designed design VBN 10_1101-436634 31 14 for for IN 10_1101-436634 31 15 relatively relatively RB 10_1101-436634 31 16 frequent frequent JJ 10_1101-436634 31 17 single single JJ 10_1101-436634 31 18 - - HYPH 10_1101-436634 31 19 nucleotide nucleotide JJ 10_1101-436634 31 20 polymorphisms polymorphism NNS 10_1101-436634 31 21 . . . 10_1101-436634 32 1 It -PRON- PRP 10_1101-436634 32 2 is be VBZ 10_1101-436634 32 3 thus thus RB 10_1101-436634 32 4 ill ill RB 10_1101-436634 32 5 - - HYPH 10_1101-436634 32 6 suited suit VBN 10_1101-436634 32 7 to to IN 10_1101-436634 32 8 studying study VBG 10_1101-436634 32 9 somatic somatic JJ 10_1101-436634 32 10 variants variant NNS 10_1101-436634 32 11 , , , 10_1101-436634 32 12 or or CC 10_1101-436634 32 13 any any DT 10_1101-436634 32 14 case case NN 10_1101-436634 32 15 in in IN 10_1101-436634 32 16 which which WDT 10_1101-436634 32 17 the the DT 10_1101-436634 32 18 frequency frequency NN 10_1101-436634 32 19 of of IN 10_1101-436634 32 20 a a DT 10_1101-436634 32 21 particular particular JJ 10_1101-436634 32 22 variant variant NN 10_1101-436634 32 23 is be VBZ 10_1101-436634 32 24 very very RB 10_1101-436634 32 25 low low JJ 10_1101-436634 32 26 ( ( -LRB- 10_1101-436634 32 27 often often RB 10_1101-436634 32 28 unique unique JJ 10_1101-436634 32 29 ) ) -RRB- 10_1101-436634 32 30 in in IN 10_1101-436634 32 31 a a DT 10_1101-436634 32 32 given give VBN 10_1101-436634 32 33 sample sample NN 10_1101-436634 32 34 population17–19 population17–19 NN 10_1101-436634 32 35 . . . 10_1101-436634 33 1 Recent recent JJ 10_1101-436634 33 2 tools tool NNS 10_1101-436634 33 3 that that WDT 10_1101-436634 33 4 have have VBP 10_1101-436634 33 5 been be VBN 10_1101-436634 33 6 created create VBN 10_1101-436634 33 7 for for IN 10_1101-436634 33 8 large large JJ 10_1101-436634 33 9 - - HYPH 10_1101-436634 33 10 scale scale NN 10_1101-436634 33 11 analysis analysis NN 10_1101-436634 33 12 of of IN 10_1101-436634 33 13 cancer cancer NN 10_1101-436634 33 14 - - HYPH 10_1101-436634 33 15 specific specific JJ 10_1101-436634 33 16 data datum NNS 10_1101-436634 33 17 , , , 10_1101-436634 33 18 such such JJ 10_1101-436634 33 19 as as IN 10_1101-436634 33 20 MiSplice MiSplice NNP 10_1101-436634 33 21 and and CC 10_1101-436634 33 22 Veridical Veridical NNP 10_1101-436634 33 23 , , , 10_1101-436634 33 24 ignore ignore VB 10_1101-436634 33 25 certain certain JJ 10_1101-436634 33 26 types type NNS 10_1101-436634 33 27 of of IN 10_1101-436634 33 28 ASEs as NNS 10_1101-436634 33 29 , , , 10_1101-436634 33 30 are be VBP 10_1101-436634 33 31 tailored tailor VBN 10_1101-436634 33 32 to to IN 10_1101-436634 33 33 specific specific JJ 10_1101-436634 33 34 analysis analysis NN 10_1101-436634 33 35 strategies strategy NNS 10_1101-436634 33 36 and and CC 10_1101-436634 33 37 sets set NNS 10_1101-436634 33 38 of of IN 10_1101-436634 33 39 hypotheses hypothesis NNS 10_1101-436634 33 40 , , , 10_1101-436634 33 41 or or CC 10_1101-436634 33 42 are be VBP 10_1101-436634 33 43 otherwise otherwise RB 10_1101-436634 33 44 inaccessible inaccessible JJ 10_1101-436634 33 45 to to IN 10_1101-436634 33 46 the the DT 10_1101-436634 33 47 end end NN 10_1101-436634 33 48 - - HYPH 10_1101-436634 33 49 user user NN 10_1101-436634 33 50 due due JJ 10_1101-436634 33 51 to to IN 10_1101-436634 33 52 issues issue NNS 10_1101-436634 33 53 such such JJ 10_1101-436634 33 54 as as IN 10_1101-436634 33 55 lack lack NN 10_1101-436634 33 56 of of IN 10_1101-436634 33 57 documentation documentation NN 10_1101-436634 33 58 , , , 10_1101-436634 33 59 difficulty difficulty NN 10_1101-436634 33 60 with with IN 10_1101-436634 33 61 installation installation NN 10_1101-436634 33 62 and and CC 10_1101-436634 33 63 integration integration NN 10_1101-436634 33 64 with with IN 10_1101-436634 33 65 existing exist VBG 10_1101-436634 33 66 pipelines pipeline NNS 10_1101-436634 33 67 , , , 10_1101-436634 33 68 limited limited JJ 10_1101-436634 33 69 computing computing NN 10_1101-436634 33 70 efficiency efficiency NN 10_1101-436634 33 71 , , , 10_1101-436634 33 72 or or CC 10_1101-436634 33 73 licensing license VBG 10_1101-436634 33 74 issues20–22 issues20–22 NNP 10_1101-436634 33 75 . . . 10_1101-436634 34 1 To to TO 10_1101-436634 34 2 address address VB 10_1101-436634 34 3 these these DT 10_1101-436634 34 4 needs need NNS 10_1101-436634 34 5 , , , 10_1101-436634 34 6 we -PRON- PRP 10_1101-436634 34 7 have have VBP 10_1101-436634 34 8 developed develop VBN 10_1101-436634 34 9 RegTools RegTools NNP 10_1101-436634 34 10 , , , 10_1101-436634 34 11 a a DT 10_1101-436634 34 12 free free JJ 10_1101-436634 34 13 , , , 10_1101-436634 34 14 open open JJ 10_1101-436634 34 15 - - HYPH 10_1101-436634 34 16 source source NN 10_1101-436634 34 17 ( ( -LRB- 10_1101-436634 34 18 MIT MIT NNP 10_1101-436634 34 19 license license NN 10_1101-436634 34 20 ) ) -RRB- 10_1101-436634 34 21 software software NN 10_1101-436634 34 22 package package NN 10_1101-436634 34 23 that that WDT 10_1101-436634 34 24 is be VBZ 10_1101-436634 34 25 well well RB 10_1101-436634 34 26 - - HYPH 10_1101-436634 34 27 documented document VBN 10_1101-436634 34 28 , , , 10_1101-436634 34 29 modularized modularize VBN 10_1101-436634 34 30 for for IN 10_1101-436634 34 31 ease ease NN 10_1101-436634 34 32 of of IN 10_1101-436634 34 33 use use NN 10_1101-436634 34 34 , , , 10_1101-436634 34 35 and and CC 10_1101-436634 34 36 designed design VBN 10_1101-436634 34 37 to to TO 10_1101-436634 34 38 efficiently efficiently RB 10_1101-436634 34 39 identify identify VB 10_1101-436634 34 40 potential potential JJ 10_1101-436634 34 41 cis cis NN 10_1101-436634 34 42 - - HYPH 10_1101-436634 34 43 acting act VBG 10_1101-436634 34 44 splice splice NN 10_1101-436634 34 45 - - HYPH 10_1101-436634 34 46 relevant relevant JJ 10_1101-436634 34 47 variants variant NNS 10_1101-436634 34 48 in in IN 10_1101-436634 34 49 tumors tumor NNS 10_1101-436634 34 50 ( ( -LRB- 10_1101-436634 34 51 www.regtools.org www.regtools.org NNP 10_1101-436634 34 52 ) ) -RRB- 10_1101-436634 34 53 . . . 10_1101-436634 35 1 RegTools RegTools NNP 10_1101-436634 35 2 is be VBZ 10_1101-436634 35 3 a a DT 10_1101-436634 35 4 suite suite NN 10_1101-436634 35 5 of of IN 10_1101-436634 35 6 tools tool NNS 10_1101-436634 35 7 designed design VBN 10_1101-436634 35 8 to to TO 10_1101-436634 35 9 aid aid VB 10_1101-436634 35 10 users user NNS 10_1101-436634 35 11 in in IN 10_1101-436634 35 12 a a DT 10_1101-436634 35 13 broad broad JJ 10_1101-436634 35 14 range range NN 10_1101-436634 35 15 of of IN 10_1101-436634 35 16 splicing splicing NN 10_1101-436634 35 17 - - HYPH 10_1101-436634 35 18 related relate VBN 10_1101-436634 35 19 analyses analysis NNS 10_1101-436634 35 20 . . . 10_1101-436634 36 1 At at IN 10_1101-436634 36 2 the the DT 10_1101-436634 36 3 highest high JJS 10_1101-436634 36 4 level level NN 10_1101-436634 36 5 , , , 10_1101-436634 36 6 it -PRON- PRP 10_1101-436634 36 7 contains contain VBZ 10_1101-436634 36 8 three three CD 10_1101-436634 36 9 sub sub NN 10_1101-436634 36 10 - - HYPH 10_1101-436634 36 11 modules module NNS 10_1101-436634 36 12 : : : 10_1101-436634 36 13 a a DT 10_1101-436634 36 14 variants variant NNS 10_1101-436634 36 15 module module NN 10_1101-436634 36 16 to to TO 10_1101-436634 36 17 annotate annotate VB 10_1101-436634 36 18 variant variant JJ 10_1101-436634 36 19 calls call NNS 10_1101-436634 36 20 with with IN 10_1101-436634 36 21 respect respect NN 10_1101-436634 36 22 to to IN 10_1101-436634 36 23 their -PRON- PRP$ 10_1101-436634 36 24 potential potential JJ 10_1101-436634 36 25 splicing splicing NN 10_1101-436634 36 26 relevance relevance NN 10_1101-436634 36 27 , , , 10_1101-436634 36 28 a a DT 10_1101-436634 36 29 junctions junction NNS 10_1101-436634 36 30 module module NN 10_1101-436634 36 31 to to TO 10_1101-436634 36 32 analyze analyze VB 10_1101-436634 36 33 aligned align VBN 10_1101-436634 36 34 RNA RNA NNP 10_1101-436634 36 35 - - HYPH 10_1101-436634 36 36 seq seq NN 10_1101-436634 36 37 data datum NNS 10_1101-436634 36 38 and and CC 10_1101-436634 36 39 associated associate VBD 10_1101-436634 36 40 splicing splicing NN 10_1101-436634 36 41 events event NNS 10_1101-436634 36 42 , , , 10_1101-436634 36 43 and and CC 10_1101-436634 36 44 a a DT 10_1101-436634 36 45 cis cis NN 10_1101-436634 36 46 - - HYPH 10_1101-436634 36 47 splice splice NN 10_1101-436634 36 48 - - HYPH 10_1101-436634 36 49 effects effect NNS 10_1101-436634 36 50 module module NN 10_1101-436634 36 51 that that WDT 10_1101-436634 36 52 integrates integrate VBZ 10_1101-436634 36 53 genomic genomic JJ 10_1101-436634 36 54 variant variant JJ 10_1101-436634 36 55 calls call NNS 10_1101-436634 36 56 and and CC 10_1101-436634 36 57 transcriptomic transcriptomic JJ 10_1101-436634 36 58 sequencing sequencing NN 10_1101-436634 36 59 data datum NNS 10_1101-436634 36 60 to to TO 10_1101-436634 36 61 identify identify VB 10_1101-436634 36 62 potential potential JJ 10_1101-436634 36 63 splice splice NN 10_1101-436634 36 64 - - HYPH 10_1101-436634 36 65 altering alter VBG 10_1101-436634 36 66 variants variant NNS 10_1101-436634 36 67 . . . 10_1101-436634 37 1 Each each DT 10_1101-436634 37 2 sub sub JJ 10_1101-436634 37 3 - - HYPH 10_1101-436634 37 4 module module NN 10_1101-436634 37 5 contains contain VBZ 10_1101-436634 37 6 one one CD 10_1101-436634 37 7 or or CC 10_1101-436634 37 8 more more JJR 10_1101-436634 37 9 commands command NNS 10_1101-436634 37 10 , , , 10_1101-436634 37 11 which which WDT 10_1101-436634 37 12 can can MD 10_1101-436634 37 13 be be VB 10_1101-436634 37 14 used use VBN 10_1101-436634 37 15 individually individually RB 10_1101-436634 37 16 or or CC 10_1101-436634 37 17 integrated integrate VBN 10_1101-436634 37 18 into into IN 10_1101-436634 37 19 regulatory regulatory JJ 10_1101-436634 37 20 variant variant JJ 10_1101-436634 37 21 analysis analysis NN 10_1101-436634 37 22 pipelines pipeline NNS 10_1101-436634 37 23 . . . 10_1101-436634 38 1 To to TO 10_1101-436634 38 2 demonstrate demonstrate VB 10_1101-436634 38 3 the the DT 10_1101-436634 38 4 utility utility NN 10_1101-436634 38 5 of of IN 10_1101-436634 38 6 RegTools RegTools NNP 10_1101-436634 38 7 in in IN 10_1101-436634 38 8 identifying identify VBG 10_1101-436634 38 9 potential potential JJ 10_1101-436634 38 10 splice splice NN 10_1101-436634 38 11 - - HYPH 10_1101-436634 38 12 relevant relevant JJ 10_1101-436634 38 13 variants variant NNS 10_1101-436634 38 14 from from IN 10_1101-436634 38 15 tumor tumor NN 10_1101-436634 38 16 data datum NNS 10_1101-436634 38 17 , , , 10_1101-436634 38 18 we -PRON- PRP 10_1101-436634 38 19 analyzed analyze VBD 10_1101-436634 38 20 a a DT 10_1101-436634 38 21 combination combination NN 10_1101-436634 38 22 of of IN 10_1101-436634 38 23 data datum NNS 10_1101-436634 38 24 available available JJ 10_1101-436634 38 25 from from IN 10_1101-436634 38 26 the the DT 10_1101-436634 38 27 McDonnell McDonnell NNP 10_1101-436634 38 28 Genome Genome NNP 10_1101-436634 38 29 Institute Institute NNP 10_1101-436634 38 30 ( ( -LRB- 10_1101-436634 38 31 MGI MGI NNP 10_1101-436634 38 32 ) ) -RRB- 10_1101-436634 38 33 at at IN 10_1101-436634 38 34 Washington Washington NNP 10_1101-436634 38 35 University University NNP 10_1101-436634 38 36 School School NNP 10_1101-436634 38 37 of of IN 10_1101-436634 38 38 Medicine Medicine NNP 10_1101-436634 38 39 and and CC 10_1101-436634 38 40 The the DT 10_1101-436634 38 41 Cancer Cancer NNP 10_1101-436634 38 42 Genome Genome NNP 10_1101-436634 38 43 Atlas Atlas NNP 10_1101-436634 38 44 ( ( -LRB- 10_1101-436634 38 45 TCGA TCGA NNP 10_1101-436634 38 46 ) ) -RRB- 10_1101-436634 38 47 project project NN 10_1101-436634 38 48 . . . 10_1101-436634 39 1 In in IN 10_1101-436634 39 2 .CC .CC NFP 10_1101-436634 39 3 - - HYPH 10_1101-436634 39 4 BY by IN 10_1101-436634 39 5 - - HYPH 10_1101-436634 39 6 NC NC NNP 10_1101-436634 39 7 - - HYPH 10_1101-436634 39 8 ND ND NNP 10_1101-436634 39 9 4.0 4.0 CD 10_1101-436634 39 10 International international JJ 10_1101-436634 39 11 licensea licensea NNS 10_1101-436634 39 12 certified certify VBN 10_1101-436634 39 13 by by IN 10_1101-436634 39 14 peer peer NN 10_1101-436634 39 15 review review NN 10_1101-436634 39 16 ) ) -RRB- 10_1101-436634 39 17 is be VBZ 10_1101-436634 39 18 the the DT 10_1101-436634 39 19 author author NN 10_1101-436634 39 20 / / SYM 10_1101-436634 39 21 funder funder NN 10_1101-436634 39 22 , , , 10_1101-436634 39 23 who who WP 10_1101-436634 39 24 has have VBZ 10_1101-436634 39 25 granted grant VBN 10_1101-436634 39 26 bioRxiv biorxiv IN 10_1101-436634 39 27 a a DT 10_1101-436634 39 28 license license NN 10_1101-436634 39 29 to to TO 10_1101-436634 39 30 display display VB 10_1101-436634 39 31 the the DT 10_1101-436634 39 32 preprint preprint NN 10_1101-436634 39 33 in in IN 10_1101-436634 39 34 perpetuity perpetuity NN 10_1101-436634 39 35 . . . 10_1101-436634 40 1 It -PRON- PRP 10_1101-436634 40 2 is be VBZ 10_1101-436634 40 3 made make VBN 10_1101-436634 40 4 available available JJ 10_1101-436634 40 5 under under IN 10_1101-436634 40 6 The the DT 10_1101-436634 40 7 copyright copyright NN 10_1101-436634 40 8 holder holder NN 10_1101-436634 40 9 for for IN 10_1101-436634 40 10 this this DT 10_1101-436634 40 11 preprint preprint NN 10_1101-436634 40 12 ( ( -LRB- 10_1101-436634 40 13 which which WDT 10_1101-436634 40 14 was be VBD 10_1101-436634 40 15 notthis notthis DT 10_1101-436634 40 16 version version NN 10_1101-436634 40 17 posted post VBN 10_1101-436634 40 18 January January NNP 10_1101-436634 40 19 5 5 CD 10_1101-436634 40 20 , , , 10_1101-436634 40 21 2021 2021 CD 10_1101-436634 40 22 . . . 10_1101-436634 40 23 ; ; : 10_1101-436634 40 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 40 25 : : : 10_1101-436634 40 26 bioRxiv biorxiv VB 10_1101-436634 40 27 preprint preprint NN 10_1101-436634 40 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 40 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 40 30 3 3 CD 10_1101-436634 40 31 total total NN 10_1101-436634 40 32 , , , 10_1101-436634 40 33 we -PRON- PRP 10_1101-436634 40 34 applied apply VBD 10_1101-436634 40 35 RegTools RegTools NNP 10_1101-436634 40 36 to to IN 10_1101-436634 40 37 9,173 9,173 CD 10_1101-436634 40 38 samples sample NNS 10_1101-436634 40 39 across across IN 10_1101-436634 40 40 35 35 CD 10_1101-436634 40 41 cancer cancer NN 10_1101-436634 40 42 types type NNS 10_1101-436634 40 43 . . . 10_1101-436634 41 1 We -PRON- PRP 10_1101-436634 41 2 contrasted contrast VBD 10_1101-436634 41 3 our -PRON- PRP$ 10_1101-436634 41 4 results result NNS 10_1101-436634 41 5 with with IN 10_1101-436634 41 6 other other JJ 10_1101-436634 41 7 tools tool NNS 10_1101-436634 41 8 that that WDT 10_1101-436634 41 9 integrate integrate VBP 10_1101-436634 41 10 genomic genomic JJ 10_1101-436634 41 11 and and CC 10_1101-436634 41 12 transcriptomic transcriptomic JJ 10_1101-436634 41 13 data datum NNS 10_1101-436634 41 14 to to TO 10_1101-436634 41 15 identify identify VB 10_1101-436634 41 16 potential potential JJ 10_1101-436634 41 17 splice splice NN 10_1101-436634 41 18 altering alter VBG 10_1101-436634 41 19 variants variant NNS 10_1101-436634 41 20 , , , 10_1101-436634 41 21 specifically specifically RB 10_1101-436634 41 22 Veridical veridical JJ 10_1101-436634 41 23 , , , 10_1101-436634 41 24 MiSplice MiSplice NNP 10_1101-436634 41 25 , , , 10_1101-436634 41 26 and and CC 10_1101-436634 41 27 SAVNet20,21,23 SAVNet20,21,23 NNS 10_1101-436634 41 28 . . . 10_1101-436634 42 1 Novel novel JJ 10_1101-436634 42 2 junctions junction NNS 10_1101-436634 42 3 identified identify VBN 10_1101-436634 42 4 by by IN 10_1101-436634 42 5 RegTools regtool NNS 10_1101-436634 42 6 were be VBD 10_1101-436634 42 7 compared compare VBN 10_1101-436634 42 8 to to IN 10_1101-436634 42 9 data datum NNS 10_1101-436634 42 10 from from IN 10_1101-436634 42 11 The the DT 10_1101-436634 42 12 Genotype Genotype NNP 10_1101-436634 42 13 - - HYPH 10_1101-436634 42 14 Tissue Tissue NNP 10_1101-436634 42 15 Expression expression NN 10_1101-436634 42 16 ( ( -LRB- 10_1101-436634 42 17 GTEx GTEx NNP 10_1101-436634 42 18 ) ) -RRB- 10_1101-436634 42 19 project project NN 10_1101-436634 42 20 to to TO 10_1101-436634 42 21 assess assess VB 10_1101-436634 42 22 whether whether IN 10_1101-436634 42 23 these these DT 10_1101-436634 42 24 junctions junction NNS 10_1101-436634 42 25 are be VBP 10_1101-436634 42 26 present present JJ 10_1101-436634 42 27 in in IN 10_1101-436634 42 28 normal normal JJ 10_1101-436634 42 29 tissues24 tissues24 NNS 10_1101-436634 42 30 . . . 10_1101-436634 43 1 Variants variant NNS 10_1101-436634 43 2 significantly significantly RB 10_1101-436634 43 3 associated associate VBN 10_1101-436634 43 4 with with IN 10_1101-436634 43 5 novel novel JJ 10_1101-436634 43 6 junctions junction NNS 10_1101-436634 43 7 were be VBD 10_1101-436634 43 8 processed process VBN 10_1101-436634 43 9 through through IN 10_1101-436634 43 10 VEP VEP NNP 10_1101-436634 43 11 and and CC 10_1101-436634 43 12 Illumina Illumina NNP 10_1101-436634 43 13 ’s ’s POS 10_1101-436634 43 14 SpliceAI SpliceAI VBN 10_1101-436634 43 15 tool tool NN 10_1101-436634 43 16 to to TO 10_1101-436634 43 17 compare compare VB 10_1101-436634 43 18 our -PRON- PRP$ 10_1101-436634 43 19 findings finding NNS 10_1101-436634 43 20 with with IN 10_1101-436634 43 21 splicing splicing NN 10_1101-436634 43 22 consequences consequence NNS 10_1101-436634 43 23 predicted predict VBD 10_1101-436634 43 24 based base VBN 10_1101-436634 43 25 on on IN 10_1101-436634 43 26 the the DT 10_1101-436634 43 27 variant variant JJ 10_1101-436634 43 28 information information NN 10_1101-436634 43 29 alone13,25 alone13,25 NNP 10_1101-436634 43 30 . . . 10_1101-436634 44 1 With with IN 10_1101-436634 44 2 this this DT 10_1101-436634 44 3 additional additional JJ 10_1101-436634 44 4 analysis analysis NN 10_1101-436634 44 5 , , , 10_1101-436634 44 6 we -PRON- PRP 10_1101-436634 44 7 were be VBD 10_1101-436634 44 8 able able JJ 10_1101-436634 44 9 to to TO 10_1101-436634 44 10 more more RBR 10_1101-436634 44 11 easily easily RB 10_1101-436634 44 12 identify identify VB 10_1101-436634 44 13 both both DT 10_1101-436634 44 14 variants variant NNS 10_1101-436634 44 15 in in IN 10_1101-436634 44 16 known know VBN 10_1101-436634 44 17 cancer cancer NN 10_1101-436634 44 18 drivers driver NNS 10_1101-436634 44 19 , , , 10_1101-436634 44 20 whose whose WP$ 10_1101-436634 44 21 splicing splicing NN 10_1101-436634 44 22 consequences consequence NNS 10_1101-436634 44 23 have have VBP 10_1101-436634 44 24 not not RB 10_1101-436634 44 25 been be VBN 10_1101-436634 44 26 previously previously RB 10_1101-436634 44 27 reported report VBN 10_1101-436634 44 28 in in IN 10_1101-436634 44 29 the the DT 10_1101-436634 44 30 literature literature NN 10_1101-436634 44 31 , , , 10_1101-436634 44 32 and and CC 10_1101-436634 44 33 potentially potentially RB 10_1101-436634 44 34 novel novel JJ 10_1101-436634 44 35 cancer cancer NN 10_1101-436634 44 36 drivers driver NNS 10_1101-436634 44 37 , , , 10_1101-436634 44 38 whose whose WP$ 10_1101-436634 44 39 disruption disruption NN 10_1101-436634 44 40 relies rely VBZ 10_1101-436634 44 41 on on IN 10_1101-436634 44 42 splice splice NN 10_1101-436634 44 43 - - HYPH 10_1101-436634 44 44 altering alter VBG 10_1101-436634 44 45 mutations mutation NNS 10_1101-436634 44 46 Results result NNS 10_1101-436634 44 47 The the DT 10_1101-436634 44 48 RegTools RegTools NNP 10_1101-436634 44 49 tool tool NN 10_1101-436634 44 50 suite suite NN 10_1101-436634 44 51 supports support NNS 10_1101-436634 44 52 splice splice NN 10_1101-436634 44 53 regulatory regulatory JJ 10_1101-436634 44 54 variant variant JJ 10_1101-436634 44 55 discovery discovery NN 10_1101-436634 44 56 by by IN 10_1101-436634 44 57 the the DT 10_1101-436634 44 58 integration integration NN 10_1101-436634 44 59 of of IN 10_1101-436634 44 60 genome genome NN 10_1101-436634 44 61 and and CC 10_1101-436634 44 62 transcriptome transcriptome DT 10_1101-436634 44 63 data datum NNS 10_1101-436634 44 64 . . . 10_1101-436634 45 1 RegTools RegTools NNP 10_1101-436634 45 2 is be VBZ 10_1101-436634 45 3 a a DT 10_1101-436634 45 4 suite suite NN 10_1101-436634 45 5 of of IN 10_1101-436634 45 6 tools tool NNS 10_1101-436634 45 7 designed design VBN 10_1101-436634 45 8 to to TO 10_1101-436634 45 9 aid aid VB 10_1101-436634 45 10 users user NNS 10_1101-436634 45 11 in in IN 10_1101-436634 45 12 a a DT 10_1101-436634 45 13 broad broad JJ 10_1101-436634 45 14 range range NN 10_1101-436634 45 15 of of IN 10_1101-436634 45 16 splicing splicing NN 10_1101-436634 45 17 - - HYPH 10_1101-436634 45 18 related relate VBN 10_1101-436634 45 19 analyses analysis NNS 10_1101-436634 45 20 . . . 10_1101-436634 46 1 The the DT 10_1101-436634 46 2 variants variant NNS 10_1101-436634 46 3 module module NN 10_1101-436634 46 4 contains contain VBZ 10_1101-436634 46 5 the the DT 10_1101-436634 46 6 annotate annotate JJ 10_1101-436634 46 7 command command NN 10_1101-436634 46 8 . . . 10_1101-436634 47 1 The the DT 10_1101-436634 47 2 variants variant NNS 10_1101-436634 47 3 annotate annotate VBP 10_1101-436634 47 4 command command NN 10_1101-436634 47 5 takes take VBZ 10_1101-436634 47 6 a a DT 10_1101-436634 47 7 VCF VCF NNP 10_1101-436634 47 8 of of IN 10_1101-436634 47 9 somatic somatic JJ 10_1101-436634 47 10 variant variant JJ 10_1101-436634 47 11 calls call NNS 10_1101-436634 47 12 and and CC 10_1101-436634 47 13 a a DT 10_1101-436634 47 14 GTF gtf NN 10_1101-436634 47 15 of of IN 10_1101-436634 47 16 transcriptome transcriptome DT 10_1101-436634 47 17 annotations annotation NNS 10_1101-436634 47 18 as as IN 10_1101-436634 47 19 input input NN 10_1101-436634 47 20 . . . 10_1101-436634 48 1 RegTools RegTools NNP 10_1101-436634 48 2 does do VBZ 10_1101-436634 48 3 not not RB 10_1101-436634 48 4 have have VB 10_1101-436634 48 5 any any DT 10_1101-436634 48 6 particular particular JJ 10_1101-436634 48 7 preference preference NN 10_1101-436634 48 8 for for IN 10_1101-436634 48 9 variant variant JJ 10_1101-436634 48 10 callers caller NNS 10_1101-436634 48 11 or or CC 10_1101-436634 48 12 reference reference NN 10_1101-436634 48 13 annotations annotation NNS 10_1101-436634 48 14 . . . 10_1101-436634 49 1 Each each DT 10_1101-436634 49 2 variant variant NN 10_1101-436634 49 3 is be VBZ 10_1101-436634 49 4 annotated annotate VBN 10_1101-436634 49 5 by by IN 10_1101-436634 49 6 RegTools RegTools NNP 10_1101-436634 49 7 with with IN 10_1101-436634 49 8 known know VBN 10_1101-436634 49 9 overlapping overlap VBG 10_1101-436634 49 10 genes gene NNS 10_1101-436634 49 11 and and CC 10_1101-436634 49 12 transcripts transcript NNS 10_1101-436634 49 13 , , , 10_1101-436634 49 14 and and CC 10_1101-436634 49 15 is be VBZ 10_1101-436634 49 16 categorized categorize VBN 10_1101-436634 49 17 into into IN 10_1101-436634 49 18 one one CD 10_1101-436634 49 19 of of IN 10_1101-436634 49 20 several several JJ 10_1101-436634 49 21 user user NN 10_1101-436634 49 22 - - HYPH 10_1101-436634 49 23 configurable configurable JJ 10_1101-436634 49 24 “ " `` 10_1101-436634 49 25 variant variant JJ 10_1101-436634 49 26 types type NNS 10_1101-436634 49 27 ” " '' 10_1101-436634 49 28 , , , 10_1101-436634 49 29 based base VBN 10_1101-436634 49 30 on on IN 10_1101-436634 49 31 position position NN 10_1101-436634 49 32 relative relative JJ 10_1101-436634 49 33 to to IN 10_1101-436634 49 34 the the DT 10_1101-436634 49 35 edges edge NNS 10_1101-436634 49 36 of of IN 10_1101-436634 49 37 known know VBN 10_1101-436634 49 38 exons exon NNS 10_1101-436634 49 39 . . . 10_1101-436634 50 1 The the DT 10_1101-436634 50 2 variant variant JJ 10_1101-436634 50 3 type type NN 10_1101-436634 50 4 annotation annotation NN 10_1101-436634 50 5 depends depend VBZ 10_1101-436634 50 6 on on IN 10_1101-436634 50 7 the the DT 10_1101-436634 50 8 stringency stringency NN 10_1101-436634 50 9 for for IN 10_1101-436634 50 10 splicing splicing NN 10_1101-436634 50 11 - - HYPH 10_1101-436634 50 12 relevance relevance NN 10_1101-436634 50 13 that that IN 10_1101-436634 50 14 the the DT 10_1101-436634 50 15 user user NN 10_1101-436634 50 16 sets set VBZ 10_1101-436634 50 17 with with IN 10_1101-436634 50 18 the the DT 10_1101-436634 50 19 “ " `` 10_1101-436634 50 20 splice splice NN 10_1101-436634 50 21 variant variant JJ 10_1101-436634 50 22 window window NN 10_1101-436634 50 23 ” " '' 10_1101-436634 50 24 setting setting NN 10_1101-436634 50 25 . . . 10_1101-436634 51 1 By by IN 10_1101-436634 51 2 default default NN 10_1101-436634 51 3 , , , 10_1101-436634 51 4 RegTools RegTools NNP 10_1101-436634 51 5 marks mark VBZ 10_1101-436634 51 6 intronic intronic JJ 10_1101-436634 51 7 variants variant NNS 10_1101-436634 51 8 within within IN 10_1101-436634 51 9 2 2 CD 10_1101-436634 51 10 bp bp NN 10_1101-436634 51 11 of of IN 10_1101-436634 51 12 the the DT 10_1101-436634 51 13 exon exon JJ 10_1101-436634 51 14 edge edge NN 10_1101-436634 51 15 as as IN 10_1101-436634 51 16 “ " `` 10_1101-436634 51 17 splicing splice VBG 10_1101-436634 51 18 intronic intronic JJ 10_1101-436634 51 19 ” " '' 10_1101-436634 51 20 , , , 10_1101-436634 51 21 exonic exonic JJ 10_1101-436634 51 22 variants variant NNS 10_1101-436634 51 23 within within IN 10_1101-436634 51 24 3 3 CD 10_1101-436634 51 25 bp bp NN 10_1101-436634 51 26 as as IN 10_1101-436634 51 27 “ " `` 10_1101-436634 51 28 splicing splice VBG 10_1101-436634 51 29 exonic exonic JJ 10_1101-436634 51 30 ” " '' 10_1101-436634 51 31 , , , 10_1101-436634 51 32 other other JJ 10_1101-436634 51 33 intronic intronic JJ 10_1101-436634 51 34 variants variant NNS 10_1101-436634 51 35 as as IN 10_1101-436634 51 36 “ " `` 10_1101-436634 51 37 intronic intronic JJ 10_1101-436634 51 38 ” " '' 10_1101-436634 51 39 , , , 10_1101-436634 51 40 and and CC 10_1101-436634 51 41 other other JJ 10_1101-436634 51 42 exonic exonic JJ 10_1101-436634 51 43 variants variant NNS 10_1101-436634 51 44 simply simply RB 10_1101-436634 51 45 as as IN 10_1101-436634 51 46 “ " `` 10_1101-436634 51 47 exonic exonic JJ 10_1101-436634 51 48 . . . 10_1101-436634 51 49 ” " '' 10_1101-436634 51 50 RegTools RegTools NNP 10_1101-436634 51 51 considers consider VBZ 10_1101-436634 51 52 only only RB 10_1101-436634 51 53 “ " `` 10_1101-436634 51 54 splicing splice VBG 10_1101-436634 51 55 intronic intronic JJ 10_1101-436634 51 56 ” " '' 10_1101-436634 51 57 and and CC 10_1101-436634 51 58 “ " `` 10_1101-436634 51 59 splicing splice VBG 10_1101-436634 51 60 exonic exonic JJ 10_1101-436634 51 61 ” " '' 10_1101-436634 51 62 as as IN 10_1101-436634 51 63 important important JJ 10_1101-436634 51 64 . . . 10_1101-436634 52 1 To to TO 10_1101-436634 52 2 allow allow VB 10_1101-436634 52 3 for for IN 10_1101-436634 52 4 discovery discovery NN 10_1101-436634 52 5 of of IN 10_1101-436634 52 6 an an DT 10_1101-436634 52 7 arbitrarily arbitrarily RB 10_1101-436634 52 8 expansive expansive JJ 10_1101-436634 52 9 set set NN 10_1101-436634 52 10 of of IN 10_1101-436634 52 11 variants variant NNS 10_1101-436634 52 12 , , , 10_1101-436634 52 13 RegTools RegTools NNP 10_1101-436634 52 14 allows allow VBZ 10_1101-436634 52 15 the the DT 10_1101-436634 52 16 user user NN 10_1101-436634 52 17 to to TO 10_1101-436634 52 18 customize customize VB 10_1101-436634 52 19 the the DT 10_1101-436634 52 20 size size NN 10_1101-436634 52 21 of of IN 10_1101-436634 52 22 the the DT 10_1101-436634 52 23 exonic exonic JJ 10_1101-436634 52 24 / / SYM 10_1101-436634 52 25 intronic intronic JJ 10_1101-436634 52 26 windows window NNS 10_1101-436634 52 27 individually individually RB 10_1101-436634 52 28 ( ( -LRB- 10_1101-436634 52 29 e.g. e.g. RB 10_1101-436634 53 1 -i -i : 10_1101-436634 53 2 50 50 CD 10_1101-436634 53 3 -e -e SYM 10_1101-436634 53 4 5 5 CD 10_1101-436634 53 5 for for IN 10_1101-436634 53 6 intronic intronic JJ 10_1101-436634 53 7 variants variant NNS 10_1101-436634 53 8 50 50 CD 10_1101-436634 53 9 bp bp NN 10_1101-436634 53 10 from from IN 10_1101-436634 53 11 an an DT 10_1101-436634 53 12 exon exon JJ 10_1101-436634 53 13 edge edge NN 10_1101-436634 53 14 and and CC 10_1101-436634 53 15 exonic exonic JJ 10_1101-436634 53 16 variants variant NNS 10_1101-436634 53 17 5 5 CD 10_1101-436634 53 18 bp bp NN 10_1101-436634 53 19 from from IN 10_1101-436634 53 20 an an DT 10_1101-436634 53 21 exon exon JJ 10_1101-436634 53 22 edge edge NN 10_1101-436634 53 23 ) ) -RRB- 10_1101-436634 53 24 or or CC 10_1101-436634 53 25 even even RB 10_1101-436634 53 26 consider consider VB 10_1101-436634 53 27 all all RB 10_1101-436634 53 28 exonic exonic JJ 10_1101-436634 53 29 / / SYM 10_1101-436634 53 30 intronic intronic JJ 10_1101-436634 53 31 variants variant NNS 10_1101-436634 53 32 as as IN 10_1101-436634 53 33 potentially potentially RB 10_1101-436634 53 34 splicing splice VBG 10_1101-436634 53 35 - - HYPH 10_1101-436634 53 36 relevant relevant JJ 10_1101-436634 53 37 ( ( -LRB- 10_1101-436634 53 38 e.g. e.g. RB 10_1101-436634 54 1 -E -E : 10_1101-436634 54 2 or or CC 10_1101-436634 54 3 -I -I . 10_1101-436634 54 4 ) ) -RRB- 10_1101-436634 54 5 ( ( -LRB- 10_1101-436634 54 6 Figure figure NN 10_1101-436634 54 7 1A 1a NN 10_1101-436634 54 8 ) ) -RRB- 10_1101-436634 54 9 . . . 10_1101-436634 55 1 The the DT 10_1101-436634 55 2 junctions junction NNS 10_1101-436634 55 3 module module NN 10_1101-436634 55 4 contains contain VBZ 10_1101-436634 55 5 the the DT 10_1101-436634 55 6 extract extract NN 10_1101-436634 55 7 and and CC 10_1101-436634 55 8 annotate annotate JJ 10_1101-436634 55 9 commands command NNS 10_1101-436634 55 10 . . . 10_1101-436634 56 1 The the DT 10_1101-436634 56 2 junctions junction NNS 10_1101-436634 56 3 extract extract VBP 10_1101-436634 56 4 command command NN 10_1101-436634 56 5 takes take VBZ 10_1101-436634 56 6 an an DT 10_1101-436634 56 7 alignment alignment NN 10_1101-436634 56 8 file file NN 10_1101-436634 56 9 containing contain VBG 10_1101-436634 56 10 aligned align VBN 10_1101-436634 56 11 RNA RNA NNP 10_1101-436634 56 12 - - HYPH 10_1101-436634 56 13 seq seq NN 10_1101-436634 56 14 reads read NNS 10_1101-436634 56 15 , , , 10_1101-436634 56 16 infers infer VBZ 10_1101-436634 56 17 the the DT 10_1101-436634 56 18 exon exon NN 10_1101-436634 56 19 - - HYPH 10_1101-436634 56 20 exon exon NN 10_1101-436634 56 21 boundaries boundary NNS 10_1101-436634 56 22 based base VBN 10_1101-436634 56 23 on on IN 10_1101-436634 56 24 the the DT 10_1101-436634 56 25 CIGAR cigar NN 10_1101-436634 56 26 strings26 strings26 NN 10_1101-436634 56 27 , , , 10_1101-436634 56 28 and and CC 10_1101-436634 56 29 outputs output NNS 10_1101-436634 56 30 each each DT 10_1101-436634 56 31 “ " `` 10_1101-436634 56 32 junction junction NN 10_1101-436634 56 33 ” " '' 10_1101-436634 56 34 as as IN 10_1101-436634 56 35 a a DT 10_1101-436634 56 36 feature feature NN 10_1101-436634 56 37 in in IN 10_1101-436634 56 38 BED12 BED12 NNP 10_1101-436634 56 39 format format NN 10_1101-436634 56 40 . . . 10_1101-436634 57 1 The the DT 10_1101-436634 57 2 junctions junction NNS 10_1101-436634 57 3 annotate annotate VBP 10_1101-436634 57 4 command command NN 10_1101-436634 57 5 takes take VBZ 10_1101-436634 57 6 a a DT 10_1101-436634 57 7 file file NN 10_1101-436634 57 8 of of IN 10_1101-436634 57 9 junctions junction NNS 10_1101-436634 57 10 in in IN 10_1101-436634 57 11 BED12 BED12 NNP 10_1101-436634 57 12 format format NN 10_1101-436634 57 13 ( ( -LRB- 10_1101-436634 57 14 such such JJ 10_1101-436634 57 15 as as IN 10_1101-436634 57 16 the the DT 10_1101-436634 57 17 one one CD 10_1101-436634 57 18 output output NN 10_1101-436634 57 19 by by IN 10_1101-436634 57 20 junctions junction NNS 10_1101-436634 57 21 extract extract NNP 10_1101-436634 57 22 ) ) -RRB- 10_1101-436634 57 23 , , , 10_1101-436634 57 24 a a DT 10_1101-436634 57 25 FASTA FASTA NNP 10_1101-436634 57 26 file file NN 10_1101-436634 57 27 containing contain VBG 10_1101-436634 57 28 the the DT 10_1101-436634 57 29 reference reference NN 10_1101-436634 57 30 genome genome NN 10_1101-436634 57 31 , , , 10_1101-436634 57 32 and and CC 10_1101-436634 57 33 a a DT 10_1101-436634 57 34 GTF gtf NN 10_1101-436634 57 35 file file NN 10_1101-436634 57 36 containing contain VBG 10_1101-436634 57 37 reference reference NN 10_1101-436634 57 38 transcriptome transcriptome DT 10_1101-436634 57 39 annotations annotation NNS 10_1101-436634 57 40 and and CC 10_1101-436634 57 41 generates generate VBZ 10_1101-436634 57 42 a a DT 10_1101-436634 57 43 TSV TSV NNP 10_1101-436634 57 44 file file NN 10_1101-436634 57 45 , , , 10_1101-436634 57 46 annotating annotate VBG 10_1101-436634 57 47 each each DT 10_1101-436634 57 48 junction junction NN 10_1101-436634 57 49 with with IN 10_1101-436634 57 50 : : : 10_1101-436634 57 51 the the DT 10_1101-436634 57 52 number number NN 10_1101-436634 57 53 of of IN 10_1101-436634 57 54 acceptor acceptor NN 10_1101-436634 57 55 sites site NNS 10_1101-436634 57 56 , , , 10_1101-436634 57 57 donor donor NN 10_1101-436634 57 58 sites site NNS 10_1101-436634 57 59 , , , 10_1101-436634 57 60 and and CC 10_1101-436634 57 61 exons exon NNS 10_1101-436634 57 62 skipped skip VBD 10_1101-436634 57 63 , , , 10_1101-436634 57 64 and and CC 10_1101-436634 57 65 the the DT 10_1101-436634 57 66 identities identity NNS 10_1101-436634 57 67 of of IN 10_1101-436634 57 68 known know VBN 10_1101-436634 57 69 overlapping overlap VBG 10_1101-436634 57 70 transcripts transcript NNS 10_1101-436634 57 71 and and CC 10_1101-436634 57 72 genes gene NNS 10_1101-436634 57 73 . . . 10_1101-436634 58 1 We -PRON- PRP 10_1101-436634 58 2 also also RB 10_1101-436634 58 3 annotate annotate VBP 10_1101-436634 58 4 the the DT 10_1101-436634 58 5 “ " `` 10_1101-436634 58 6 junction junction NN 10_1101-436634 58 7 type type NN 10_1101-436634 58 8 ” " '' 10_1101-436634 58 9 , , , 10_1101-436634 58 10 which which WDT 10_1101-436634 58 11 denotes denote VBZ 10_1101-436634 58 12 if if IN 10_1101-436634 58 13 and and CC 10_1101-436634 58 14 how how WRB 10_1101-436634 58 15 the the DT 10_1101-436634 58 16 junction junction NN 10_1101-436634 58 17 is be VBZ 10_1101-436634 58 18 novel novel JJ 10_1101-436634 58 19 ( ( -LRB- 10_1101-436634 58 20 i.e. i.e. FW 10_1101-436634 59 1 different different JJ 10_1101-436634 59 2 compared compare VBN 10_1101-436634 59 3 to to IN 10_1101-436634 59 4 provided provide VBN 10_1101-436634 59 5 transcript transcript NN 10_1101-436634 59 6 annotations annotation NNS 10_1101-436634 59 7 ) ) -RRB- 10_1101-436634 59 8 . . . 10_1101-436634 60 1 If if IN 10_1101-436634 60 2 the the DT 10_1101-436634 60 3 donor donor NN 10_1101-436634 60 4 is be VBZ 10_1101-436634 60 5 known know VBN 10_1101-436634 60 6 , , , 10_1101-436634 60 7 but but CC 10_1101-436634 60 8 the the DT 10_1101-436634 60 9 acceptor acceptor NN 10_1101-436634 60 10 is be VBZ 10_1101-436634 60 11 not not RB 10_1101-436634 60 12 or or CC 10_1101-436634 60 13 vice vice RB 10_1101-436634 60 14 - - HYPH 10_1101-436634 60 15 versa versa JJ 10_1101-436634 60 16 , , , 10_1101-436634 60 17 it -PRON- PRP 10_1101-436634 60 18 is be VBZ 10_1101-436634 60 19 marked mark VBN 10_1101-436634 60 20 as as IN 10_1101-436634 60 21 “ " `` 10_1101-436634 60 22 D d NN 10_1101-436634 60 23 ” " '' 10_1101-436634 60 24 or or CC 10_1101-436634 60 25 “ " `` 10_1101-436634 60 26 A a NN 10_1101-436634 60 27 ” " '' 10_1101-436634 60 28 , , , 10_1101-436634 60 29 respectively respectively RB 10_1101-436634 60 30 . . . 10_1101-436634 61 1 If if IN 10_1101-436634 61 2 both both DT 10_1101-436634 61 3 are be VBP 10_1101-436634 61 4 known know VBN 10_1101-436634 61 5 , , , 10_1101-436634 61 6 but but CC 10_1101-436634 61 7 the the DT 10_1101-436634 61 8 pairing pairing NN 10_1101-436634 61 9 is be VBZ 10_1101-436634 61 10 not not RB 10_1101-436634 61 11 known know VBN 10_1101-436634 61 12 , , , 10_1101-436634 61 13 it -PRON- PRP 10_1101-436634 61 14 is be VBZ 10_1101-436634 61 15 marked mark VBN 10_1101-436634 61 16 as as IN 10_1101-436634 61 17 “ " `` 10_1101-436634 61 18 NDA NDA NNP 10_1101-436634 61 19 ” " '' 10_1101-436634 61 20 , , , 10_1101-436634 61 21 whereas whereas IN 10_1101-436634 61 22 if if IN 10_1101-436634 61 23 both both DT 10_1101-436634 61 24 are be VBP 10_1101-436634 61 25 .CC .CC : 10_1101-436634 61 26 - - : 10_1101-436634 61 27 BY by IN 10_1101-436634 61 28 - - HYPH 10_1101-436634 61 29 NC NC NNP 10_1101-436634 61 30 - - HYPH 10_1101-436634 61 31 ND ND NNP 10_1101-436634 61 32 4.0 4.0 CD 10_1101-436634 61 33 International international JJ 10_1101-436634 61 34 licensea licensea NNS 10_1101-436634 61 35 certified certify VBN 10_1101-436634 61 36 by by IN 10_1101-436634 61 37 peer peer NN 10_1101-436634 61 38 review review NN 10_1101-436634 61 39 ) ) -RRB- 10_1101-436634 61 40 is be VBZ 10_1101-436634 61 41 the the DT 10_1101-436634 61 42 author author NN 10_1101-436634 61 43 / / SYM 10_1101-436634 61 44 funder funder NN 10_1101-436634 61 45 , , , 10_1101-436634 61 46 who who WP 10_1101-436634 61 47 has have VBZ 10_1101-436634 61 48 granted grant VBN 10_1101-436634 61 49 bioRxiv biorxiv IN 10_1101-436634 61 50 a a DT 10_1101-436634 61 51 license license NN 10_1101-436634 61 52 to to TO 10_1101-436634 61 53 display display VB 10_1101-436634 61 54 the the DT 10_1101-436634 61 55 preprint preprint NN 10_1101-436634 61 56 in in IN 10_1101-436634 61 57 perpetuity perpetuity NN 10_1101-436634 61 58 . . . 10_1101-436634 62 1 It -PRON- PRP 10_1101-436634 62 2 is be VBZ 10_1101-436634 62 3 made make VBN 10_1101-436634 62 4 available available JJ 10_1101-436634 62 5 under under IN 10_1101-436634 62 6 The the DT 10_1101-436634 62 7 copyright copyright NN 10_1101-436634 62 8 holder holder NN 10_1101-436634 62 9 for for IN 10_1101-436634 62 10 this this DT 10_1101-436634 62 11 preprint preprint NN 10_1101-436634 62 12 ( ( -LRB- 10_1101-436634 62 13 which which WDT 10_1101-436634 62 14 was be VBD 10_1101-436634 62 15 notthis notthis DT 10_1101-436634 62 16 version version NN 10_1101-436634 62 17 posted post VBN 10_1101-436634 62 18 January January NNP 10_1101-436634 62 19 5 5 CD 10_1101-436634 62 20 , , , 10_1101-436634 62 21 2021 2021 CD 10_1101-436634 62 22 . . . 10_1101-436634 62 23 ; ; : 10_1101-436634 62 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 62 25 : : : 10_1101-436634 62 26 bioRxiv biorxiv VB 10_1101-436634 62 27 preprint preprint NN 10_1101-436634 62 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 62 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 62 30 4 4 CD 10_1101-436634 62 31 unknown unknown JJ 10_1101-436634 62 32 , , , 10_1101-436634 62 33 it -PRON- PRP 10_1101-436634 62 34 is be VBZ 10_1101-436634 62 35 marked mark VBN 10_1101-436634 62 36 as as IN 10_1101-436634 62 37 “ " `` 10_1101-436634 62 38 N N NNP 10_1101-436634 62 39 ” " '' 10_1101-436634 62 40 . . . 10_1101-436634 63 1 If if IN 10_1101-436634 63 2 the the DT 10_1101-436634 63 3 junction junction NN 10_1101-436634 63 4 is be VBZ 10_1101-436634 63 5 not not RB 10_1101-436634 63 6 novel novel JJ 10_1101-436634 63 7 ( ( -LRB- 10_1101-436634 63 8 i.e. i.e. FW 10_1101-436634 64 1 it -PRON- PRP 10_1101-436634 64 2 appears appear VBZ 10_1101-436634 64 3 in in IN 10_1101-436634 64 4 at at RB 10_1101-436634 64 5 least least RBS 10_1101-436634 64 6 one one CD 10_1101-436634 64 7 transcript transcript NN 10_1101-436634 64 8 in in IN 10_1101-436634 64 9 the the DT 10_1101-436634 64 10 supplied supplied JJ 10_1101-436634 64 11 GTF GTF NNP 10_1101-436634 64 12 ) ) -RRB- 10_1101-436634 64 13 , , , 10_1101-436634 64 14 it -PRON- PRP 10_1101-436634 64 15 is be VBZ 10_1101-436634 64 16 marked mark VBN 10_1101-436634 64 17 as as IN 10_1101-436634 64 18 “ " `` 10_1101-436634 64 19 DA DA NNP 10_1101-436634 64 20 ” " '' 10_1101-436634 64 21 ( ( -LRB- 10_1101-436634 64 22 Figure figure NN 10_1101-436634 64 23 1B 1b NN 10_1101-436634 64 24 ) ) -RRB- 10_1101-436634 64 25 . . . 10_1101-436634 65 1 The the DT 10_1101-436634 65 2 cis cis NN 10_1101-436634 65 3 - - HYPH 10_1101-436634 65 4 splice splice NN 10_1101-436634 65 5 - - HYPH 10_1101-436634 65 6 effects effect NNS 10_1101-436634 65 7 module module NN 10_1101-436634 65 8 contains contain VBZ 10_1101-436634 65 9 the the DT 10_1101-436634 65 10 identify identify NN 10_1101-436634 65 11 command command NN 10_1101-436634 65 12 , , , 10_1101-436634 65 13 which which WDT 10_1101-436634 65 14 identifies identify VBZ 10_1101-436634 65 15 potential potential JJ 10_1101-436634 65 16 splice- splice- NNP 10_1101-436634 65 17 altering alter VBG 10_1101-436634 65 18 variants variant NNS 10_1101-436634 65 19 from from IN 10_1101-436634 65 20 sequencing sequence VBG 10_1101-436634 65 21 data datum NNS 10_1101-436634 65 22 . . . 10_1101-436634 66 1 The the DT 10_1101-436634 66 2 following follow VBG 10_1101-436634 66 3 are be VBP 10_1101-436634 66 4 required require VBN 10_1101-436634 66 5 as as IN 10_1101-436634 66 6 input input NN 10_1101-436634 66 7 : : : 10_1101-436634 66 8 a a DT 10_1101-436634 66 9 VCF VCF NNP 10_1101-436634 66 10 file file NN 10_1101-436634 66 11 containing contain VBG 10_1101-436634 66 12 variant variant JJ 10_1101-436634 66 13 calls call NNS 10_1101-436634 66 14 , , , 10_1101-436634 66 15 an an DT 10_1101-436634 66 16 alignment alignment NN 10_1101-436634 66 17 file file NN 10_1101-436634 66 18 containing contain VBG 10_1101-436634 66 19 aligned align VBN 10_1101-436634 66 20 RNA RNA NNP 10_1101-436634 66 21 - - HYPH 10_1101-436634 66 22 sequencing sequence VBG 10_1101-436634 66 23 reads read NNS 10_1101-436634 66 24 , , , 10_1101-436634 66 25 a a DT 10_1101-436634 66 26 reference reference NN 10_1101-436634 66 27 genome genome JJ 10_1101-436634 66 28 FASTA FASTA NNP 10_1101-436634 66 29 file file NN 10_1101-436634 66 30 , , , 10_1101-436634 66 31 and and CC 10_1101-436634 66 32 a a DT 10_1101-436634 66 33 reference reference NN 10_1101-436634 66 34 transcriptome transcriptome DT 10_1101-436634 66 35 GTF gtf NN 10_1101-436634 66 36 file file NN 10_1101-436634 66 37 . . . 10_1101-436634 67 1 The the DT 10_1101-436634 67 2 identify identify NN 10_1101-436634 67 3 pipeline pipeline NN 10_1101-436634 67 4 internally internally RB 10_1101-436634 67 5 relies rely VBZ 10_1101-436634 67 6 on on IN 10_1101-436634 67 7 variants variant NNS 10_1101-436634 67 8 annotate annotate JJ 10_1101-436634 67 9 , , , 10_1101-436634 67 10 junctions junction NNS 10_1101-436634 67 11 extract extract NN 10_1101-436634 67 12 , , , 10_1101-436634 67 13 and and CC 10_1101-436634 67 14 junctions junction NNS 10_1101-436634 67 15 annotate annotate JJ 10_1101-436634 67 16 to to TO 10_1101-436634 67 17 output output VB 10_1101-436634 67 18 a a DT 10_1101-436634 67 19 TSV TSV NNP 10_1101-436634 67 20 containing contain VBG 10_1101-436634 67 21 junctions junction NNS 10_1101-436634 67 22 proximal proximal JJ 10_1101-436634 67 23 to to IN 10_1101-436634 67 24 putatively putatively RB 10_1101-436634 67 25 splicing splicing VB 10_1101-436634 67 26 - - HYPH 10_1101-436634 67 27 relevant relevant JJ 10_1101-436634 67 28 variants variant NNS 10_1101-436634 67 29 . . . 10_1101-436634 68 1 The the DT 10_1101-436634 68 2 identify identify NN 10_1101-436634 68 3 pipeline pipeline NN 10_1101-436634 68 4 can can MD 10_1101-436634 68 5 be be VB 10_1101-436634 68 6 customized customize VBN 10_1101-436634 68 7 using use VBG 10_1101-436634 68 8 the the DT 10_1101-436634 68 9 same same JJ 10_1101-436634 68 10 parameters parameter NNS 10_1101-436634 68 11 as as IN 10_1101-436634 68 12 in in IN 10_1101-436634 68 13 the the DT 10_1101-436634 68 14 individual individual JJ 10_1101-436634 68 15 commands command NNS 10_1101-436634 68 16 . . . 10_1101-436634 69 1 Briefly briefly RB 10_1101-436634 69 2 , , , 10_1101-436634 69 3 cis cis NN 10_1101-436634 69 4 - - HYPH 10_1101-436634 69 5 splice splice NN 10_1101-436634 69 6 - - HYPH 10_1101-436634 69 7 effects effect NNS 10_1101-436634 69 8 identify identify VBP 10_1101-436634 69 9 first first JJ 10_1101-436634 69 10 performs perform VBZ 10_1101-436634 69 11 variants variant NNS 10_1101-436634 69 12 annotate annotate JJ 10_1101-436634 69 13 to to TO 10_1101-436634 69 14 determine determine VB 10_1101-436634 69 15 the the DT 10_1101-436634 69 16 splicing splicing NN 10_1101-436634 69 17 - - HYPH 10_1101-436634 69 18 relevance relevance NN 10_1101-436634 69 19 of of IN 10_1101-436634 69 20 each each DT 10_1101-436634 69 21 variant variant JJ 10_1101-436634 69 22 in in IN 10_1101-436634 69 23 the the DT 10_1101-436634 69 24 input input NN 10_1101-436634 69 25 VCF vcf NN 10_1101-436634 69 26 . . . 10_1101-436634 70 1 For for IN 10_1101-436634 70 2 each each DT 10_1101-436634 70 3 variant variant JJ 10_1101-436634 70 4 , , , 10_1101-436634 70 5 a a DT 10_1101-436634 70 6 “ " `` 10_1101-436634 70 7 splice splice NN 10_1101-436634 70 8 junction junction NN 10_1101-436634 70 9 region region NN 10_1101-436634 70 10 ” " '' 10_1101-436634 70 11 is be VBZ 10_1101-436634 70 12 determined determine VBN 10_1101-436634 70 13 by by IN 10_1101-436634 70 14 finding find VBG 10_1101-436634 70 15 the the DT 10_1101-436634 70 16 largest large JJS 10_1101-436634 70 17 span span NN 10_1101-436634 70 18 of of IN 10_1101-436634 70 19 sequence sequence NN 10_1101-436634 70 20 space space NN 10_1101-436634 70 21 between between IN 10_1101-436634 70 22 the the DT 10_1101-436634 70 23 exons exon NNS 10_1101-436634 70 24 that that WDT 10_1101-436634 70 25 flank flank VBP 10_1101-436634 70 26 the the DT 10_1101-436634 70 27 exon exon NNS 10_1101-436634 70 28 associated associate VBN 10_1101-436634 70 29 with with IN 10_1101-436634 70 30 the the DT 10_1101-436634 70 31 variant variant NN 10_1101-436634 70 32 . . . 10_1101-436634 71 1 From from IN 10_1101-436634 71 2 here here RB 10_1101-436634 71 3 , , , 10_1101-436634 71 4 junctions junction NNS 10_1101-436634 71 5 extract extract VBP 10_1101-436634 71 6 identifies identifie NNS 10_1101-436634 71 7 splicing splice VBG 10_1101-436634 71 8 junctions junction NNS 10_1101-436634 71 9 present present JJ 10_1101-436634 71 10 in in IN 10_1101-436634 71 11 the the DT 10_1101-436634 71 12 RNA RNA NNP 10_1101-436634 71 13 - - HYPH 10_1101-436634 71 14 seq seq NN 10_1101-436634 71 15 BAM BAM NNP 10_1101-436634 71 16 . . . 10_1101-436634 72 1 Next next RB 10_1101-436634 72 2 , , , 10_1101-436634 72 3 junctions junction NNS 10_1101-436634 72 4 annotate annotate VBP 10_1101-436634 72 5 labels label NNS 10_1101-436634 72 6 each each DT 10_1101-436634 72 7 extracted extract VBN 10_1101-436634 72 8 junction junction NN 10_1101-436634 72 9 with with IN 10_1101-436634 72 10 information information NN 10_1101-436634 72 11 from from IN 10_1101-436634 72 12 the the DT 10_1101-436634 72 13 reference reference NN 10_1101-436634 72 14 transcriptome transcriptome VBN 10_1101-436634 72 15 as as IN 10_1101-436634 72 16 described describe VBN 10_1101-436634 72 17 above above IN 10_1101-436634 72 18 and and CC 10_1101-436634 72 19 its -PRON- PRP$ 10_1101-436634 72 20 associated associate VBN 10_1101-436634 72 21 variants variant NNS 10_1101-436634 72 22 based base VBN 10_1101-436634 72 23 on on IN 10_1101-436634 72 24 splice splice NN 10_1101-436634 72 25 junction junction NN 10_1101-436634 72 26 region region NN 10_1101-436634 72 27 overlap overlap NN 10_1101-436634 72 28 ( ( -LRB- 10_1101-436634 72 29 Figure figure NN 10_1101-436634 72 30 1C 1c NN 10_1101-436634 72 31 ) ) -RRB- 10_1101-436634 72 32 . . . 10_1101-436634 73 1 For for IN 10_1101-436634 73 2 our -PRON- PRP$ 10_1101-436634 73 3 analysis analysis NN 10_1101-436634 73 4 , , , 10_1101-436634 73 5 we -PRON- PRP 10_1101-436634 73 6 annotated annotate VBD 10_1101-436634 73 7 the the DT 10_1101-436634 73 8 pairs pair NNS 10_1101-436634 73 9 of of IN 10_1101-436634 73 10 associated associated JJ 10_1101-436634 73 11 variants variant NNS 10_1101-436634 73 12 and and CC 10_1101-436634 73 13 junctions junction NNS 10_1101-436634 73 14 identified identify VBN 10_1101-436634 73 15 by by IN 10_1101-436634 73 16 RegTools RegTools NNP 10_1101-436634 73 17 , , , 10_1101-436634 73 18 which which WDT 10_1101-436634 73 19 we -PRON- PRP 10_1101-436634 73 20 refer refer VBP 10_1101-436634 73 21 to to IN 10_1101-436634 73 22 as as IN 10_1101-436634 73 23 “ " `` 10_1101-436634 73 24 events event NNS 10_1101-436634 73 25 ” " '' 10_1101-436634 73 26 , , , 10_1101-436634 73 27 with with IN 10_1101-436634 73 28 additional additional JJ 10_1101-436634 73 29 information information NN 10_1101-436634 73 30 such such JJ 10_1101-436634 73 31 as as IN 10_1101-436634 73 32 whether whether IN 10_1101-436634 73 33 this this DT 10_1101-436634 73 34 association association NN 10_1101-436634 73 35 was be VBD 10_1101-436634 73 36 identified identify VBN 10_1101-436634 73 37 by by IN 10_1101-436634 73 38 a a DT 10_1101-436634 73 39 comparable comparable JJ 10_1101-436634 73 40 tool tool NN 10_1101-436634 73 41 , , , 10_1101-436634 73 42 the the DT 10_1101-436634 73 43 junction junction NN 10_1101-436634 73 44 was be VBD 10_1101-436634 73 45 found find VBN 10_1101-436634 73 46 in in IN 10_1101-436634 73 47 GTEx GTEx NNS 10_1101-436634 73 48 , , , 10_1101-436634 73 49 and and CC 10_1101-436634 73 50 whether whether IN 10_1101-436634 73 51 the the DT 10_1101-436634 73 52 event event NN 10_1101-436634 73 53 occurred occur VBD 10_1101-436634 73 54 in in IN 10_1101-436634 73 55 a a DT 10_1101-436634 73 56 cancer cancer NN 10_1101-436634 73 57 gene gene NN 10_1101-436634 73 58 according accord VBG 10_1101-436634 73 59 to to IN 10_1101-436634 73 60 Cancer Cancer NNP 10_1101-436634 73 61 Gene Gene NNP 10_1101-436634 73 62 Census Census NNP 10_1101-436634 73 63 ( ( -LRB- 10_1101-436634 73 64 CGC CGC NNP 10_1101-436634 73 65 ) ) -RRB- 10_1101-436634 73 66 ( ( -LRB- 10_1101-436634 73 67 Figure Figure NNP 10_1101-436634 73 68 1C)24,27 1C)24,27 NNP 10_1101-436634 73 69 . . . 10_1101-436634 74 1 Finally finally RB 10_1101-436634 74 2 , , , 10_1101-436634 74 3 we -PRON- PRP 10_1101-436634 74 4 created create VBD 10_1101-436634 74 5 IGV IGV NNP 10_1101-436634 74 6 sessions session NNS 10_1101-436634 74 7 for for IN 10_1101-436634 74 8 each each DT 10_1101-436634 74 9 event event NN 10_1101-436634 74 10 identified identify VBN 10_1101-436634 74 11 by by IN 10_1101-436634 74 12 RegTools regtool NNS 10_1101-436634 74 13 that that WDT 10_1101-436634 74 14 contained contain VBD 10_1101-436634 74 15 a a DT 10_1101-436634 74 16 bed bed NN 10_1101-436634 74 17 file file NN 10_1101-436634 74 18 with with IN 10_1101-436634 74 19 the the DT 10_1101-436634 74 20 junction junction NN 10_1101-436634 74 21 , , , 10_1101-436634 74 22 a a DT 10_1101-436634 74 23 VCF VCF NNP 10_1101-436634 74 24 file file NN 10_1101-436634 74 25 with with IN 10_1101-436634 74 26 the the DT 10_1101-436634 74 27 variant variant NN 10_1101-436634 74 28 , , , 10_1101-436634 74 29 and and CC 10_1101-436634 74 30 an an DT 10_1101-436634 74 31 alignment alignment NN 10_1101-436634 74 32 ( ( -LRB- 10_1101-436634 74 33 BAM BAM NNP 10_1101-436634 74 34 ) ) -RRB- 10_1101-436634 74 35 file file NN 10_1101-436634 74 36 for for IN 10_1101-436634 74 37 each each DT 10_1101-436634 74 38 sample sample NN 10_1101-436634 74 39 that that WDT 10_1101-436634 74 40 contained contain VBD 10_1101-436634 74 41 the the DT 10_1101-436634 74 42 variant28 variant28 NNP 10_1101-436634 74 43 . . . 10_1101-436634 75 1 These these DT 10_1101-436634 75 2 IGV IGV NNP 10_1101-436634 75 3 sessions session NNS 10_1101-436634 75 4 were be VBD 10_1101-436634 75 5 used use VBN 10_1101-436634 75 6 to to TO 10_1101-436634 75 7 manually manually RB 10_1101-436634 75 8 review review VB 10_1101-436634 75 9 candidate candidate NN 10_1101-436634 75 10 events event NNS 10_1101-436634 75 11 to to TO 10_1101-436634 75 12 assess assess VB 10_1101-436634 75 13 whether whether IN 10_1101-436634 75 14 the the DT 10_1101-436634 75 15 association association NN 10_1101-436634 75 16 between between IN 10_1101-436634 75 17 the the DT 10_1101-436634 75 18 variant variant JJ 10_1101-436634 75 19 and and CC 10_1101-436634 75 20 junction junction NN 10_1101-436634 75 21 makes make VBZ 10_1101-436634 75 22 sense sense NN 10_1101-436634 75 23 in in IN 10_1101-436634 75 24 a a DT 10_1101-436634 75 25 biological biological JJ 10_1101-436634 75 26 context context NN 10_1101-436634 75 27 . . . 10_1101-436634 76 1 RegTools RegTools NNP 10_1101-436634 76 2 is be VBZ 10_1101-436634 76 3 designed design VBN 10_1101-436634 76 4 for for IN 10_1101-436634 76 5 broad broad JJ 10_1101-436634 76 6 applicability applicability NN 10_1101-436634 76 7 and and CC 10_1101-436634 76 8 computational computational JJ 10_1101-436634 76 9 efficiency efficiency NN 10_1101-436634 76 10 . . . 10_1101-436634 77 1 By by IN 10_1101-436634 77 2 relying rely VBG 10_1101-436634 77 3 on on IN 10_1101-436634 77 4 well- well- JJ 10_1101-436634 77 5 established establish VBN 10_1101-436634 77 6 standards standard NNS 10_1101-436634 77 7 for for IN 10_1101-436634 77 8 sequence sequence NN 10_1101-436634 77 9 alignments alignment NNS 10_1101-436634 77 10 , , , 10_1101-436634 77 11 annotation annotation NN 10_1101-436634 77 12 files file NNS 10_1101-436634 77 13 , , , 10_1101-436634 77 14 and and CC 10_1101-436634 77 15 variant variant JJ 10_1101-436634 77 16 calls call NNS 10_1101-436634 77 17 and and CC 10_1101-436634 77 18 by by IN 10_1101-436634 77 19 remaining remain VBG 10_1101-436634 77 20 agnostic agnostic JJ 10_1101-436634 77 21 to to TO 10_1101-436634 77 22 downstream downstream JJ 10_1101-436634 77 23 statistical statistical JJ 10_1101-436634 77 24 methods method NNS 10_1101-436634 77 25 and and CC 10_1101-436634 77 26 comparisons comparison NNS 10_1101-436634 77 27 , , , 10_1101-436634 77 28 our -PRON- PRP$ 10_1101-436634 77 29 tool tool NN 10_1101-436634 77 30 can can MD 10_1101-436634 77 31 be be VB 10_1101-436634 77 32 applied apply VBN 10_1101-436634 77 33 to to IN 10_1101-436634 77 34 a a DT 10_1101-436634 77 35 broad broad JJ 10_1101-436634 77 36 set set NN 10_1101-436634 77 37 of of IN 10_1101-436634 77 38 scientific scientific JJ 10_1101-436634 77 39 queries query NNS 10_1101-436634 77 40 and and CC 10_1101-436634 77 41 datasets dataset NNS 10_1101-436634 77 42 . . . 10_1101-436634 78 1 Moreover moreover RB 10_1101-436634 78 2 , , , 10_1101-436634 78 3 performance performance NN 10_1101-436634 78 4 tests test NNS 10_1101-436634 78 5 show show VBP 10_1101-436634 78 6 that that IN 10_1101-436634 78 7 cis- cis- IN 10_1101-436634 78 8 splice splice NN 10_1101-436634 78 9 - - HYPH 10_1101-436634 78 10 effects effect NNS 10_1101-436634 78 11 identify identify VBP 10_1101-436634 78 12 can can MD 10_1101-436634 78 13 process process VB 10_1101-436634 78 14 a a DT 10_1101-436634 78 15 typical typical JJ 10_1101-436634 78 16 candidate candidate NN 10_1101-436634 78 17 variant variant JJ 10_1101-436634 78 18 list list NN 10_1101-436634 78 19 of of IN 10_1101-436634 78 20 1,500,000 1,500,000 CD 10_1101-436634 78 21 variants variant NNS 10_1101-436634 78 22 and and CC 10_1101-436634 78 23 a a DT 10_1101-436634 78 24 corresponding corresponding JJ 10_1101-436634 78 25 RNA RNA NNP 10_1101-436634 78 26 - - HYPH 10_1101-436634 78 27 seq seq NN 10_1101-436634 78 28 BAM BAM NNP 10_1101-436634 78 29 file file NN 10_1101-436634 78 30 of of IN 10_1101-436634 78 31 82,807,868 82,807,868 CD 10_1101-436634 78 32 reads read NNS 10_1101-436634 78 33 in in IN 10_1101-436634 78 34 just just RB 10_1101-436634 78 35 ~8 ~8 NFP 10_1101-436634 78 36 minutes minute NNS 10_1101-436634 78 37 ( ( -LRB- 10_1101-436634 78 38 Supplementary Supplementary NNP 10_1101-436634 78 39 Figure Figure NNP 10_1101-436634 78 40 1 1 CD 10_1101-436634 78 41 ) ) -RRB- 10_1101-436634 78 42 . . . 10_1101-436634 79 1 Pan pan JJ 10_1101-436634 79 2 - - JJ 10_1101-436634 79 3 cancer cancer JJ 10_1101-436634 79 4 analysis analysis NN 10_1101-436634 79 5 of of IN 10_1101-436634 79 6 35 35 CD 10_1101-436634 79 7 tumor tumor NN 10_1101-436634 79 8 types type NNS 10_1101-436634 79 9 identifies identify VBZ 10_1101-436634 79 10 somatic somatic JJ 10_1101-436634 79 11 variants variant NNS 10_1101-436634 79 12 that that WDT 10_1101-436634 79 13 alter alter VBP 10_1101-436634 79 14 canonical canonical JJ 10_1101-436634 79 15 splicing splicing NN 10_1101-436634 79 16 RegTools RegTools NNP 10_1101-436634 79 17 was be VBD 10_1101-436634 79 18 applied apply VBN 10_1101-436634 79 19 to to IN 10_1101-436634 79 20 9,173 9,173 CD 10_1101-436634 79 21 samples sample NNS 10_1101-436634 79 22 over over IN 10_1101-436634 79 23 35 35 CD 10_1101-436634 79 24 cancer cancer NN 10_1101-436634 79 25 types type NNS 10_1101-436634 79 26 . . . 10_1101-436634 80 1 32 32 CD 10_1101-436634 80 2 of of IN 10_1101-436634 80 3 these these DT 10_1101-436634 80 4 cohorts cohort NNS 10_1101-436634 80 5 came come VBD 10_1101-436634 80 6 from from IN 10_1101-436634 80 7 TCGA TCGA NNP 10_1101-436634 80 8 while while IN 10_1101-436634 80 9 the the DT 10_1101-436634 80 10 remaining remain VBG 10_1101-436634 80 11 three three CD 10_1101-436634 80 12 were be VBD 10_1101-436634 80 13 obtained obtain VBN 10_1101-436634 80 14 from from IN 10_1101-436634 80 15 other other JJ 10_1101-436634 80 16 projects project NNS 10_1101-436634 80 17 being be VBG 10_1101-436634 80 18 conducted conduct VBN 10_1101-436634 80 19 at at IN 10_1101-436634 80 20 MGI MGI NNP 10_1101-436634 80 21 . . . 10_1101-436634 81 1 Cohort cohort NN 10_1101-436634 81 2 sizes size NNS 10_1101-436634 81 3 ranged range VBD 10_1101-436634 81 4 from from IN 10_1101-436634 81 5 21 21 CD 10_1101-436634 81 6 to to TO 10_1101-436634 81 7 1,022 1,022 CD 10_1101-436634 81 8 samples sample NNS 10_1101-436634 81 9 . . . 10_1101-436634 82 1 In in IN 10_1101-436634 82 2 total total JJ 10_1101-436634 82 3 , , , 10_1101-436634 82 4 6,370,631 6,370,631 CD 10_1101-436634 82 5 variants variant NNS 10_1101-436634 82 6 ( ( -LRB- 10_1101-436634 82 7 Figure Figure NNP 10_1101-436634 82 8 2A 2a NN 10_1101-436634 82 9 ) ) -RRB- 10_1101-436634 82 10 and and CC 10_1101-436634 82 11 2,387,989,201 2,387,989,201 CD 10_1101-436634 82 12 junction junction NN 10_1101-436634 82 13 observations observation NNS 10_1101-436634 82 14 ( ( -LRB- 10_1101-436634 82 15 Figure figure NN 10_1101-436634 82 16 2B 2b NN 10_1101-436634 82 17 ) ) -RRB- 10_1101-436634 82 18 were be VBD 10_1101-436634 82 19 analyzed analyze VBN 10_1101-436634 82 20 by by IN 10_1101-436634 82 21 RegTools RegTools NNP 10_1101-436634 82 22 . . . 10_1101-436634 83 1 By by IN 10_1101-436634 83 2 comparing compare VBG 10_1101-436634 83 3 the the DT 10_1101-436634 83 4 number number NN 10_1101-436634 83 5 of of IN 10_1101-436634 83 6 initial initial JJ 10_1101-436634 83 7 variants variant NNS 10_1101-436634 83 8 per per IN 10_1101-436634 83 9 cohort cohort NN 10_1101-436634 83 10 to to IN 10_1101-436634 83 11 the the DT 10_1101-436634 83 12 number number NN 10_1101-436634 83 13 of of IN 10_1101-436634 83 14 statistically statistically RB 10_1101-436634 83 15 significant significant JJ 10_1101-436634 83 16 variants variant NNS 10_1101-436634 83 17 , , , 10_1101-436634 83 18 we -PRON- PRP 10_1101-436634 83 19 .CC .CC : 10_1101-436634 83 20 - - : 10_1101-436634 83 21 BY by IN 10_1101-436634 83 22 - - HYPH 10_1101-436634 83 23 NC NC NNP 10_1101-436634 83 24 - - HYPH 10_1101-436634 83 25 ND ND NNP 10_1101-436634 83 26 4.0 4.0 CD 10_1101-436634 83 27 International international JJ 10_1101-436634 83 28 licensea licensea NNS 10_1101-436634 83 29 certified certify VBN 10_1101-436634 83 30 by by IN 10_1101-436634 83 31 peer peer NN 10_1101-436634 83 32 review review NN 10_1101-436634 83 33 ) ) -RRB- 10_1101-436634 83 34 is be VBZ 10_1101-436634 83 35 the the DT 10_1101-436634 83 36 author author NN 10_1101-436634 83 37 / / SYM 10_1101-436634 83 38 funder funder NN 10_1101-436634 83 39 , , , 10_1101-436634 83 40 who who WP 10_1101-436634 83 41 has have VBZ 10_1101-436634 83 42 granted grant VBN 10_1101-436634 83 43 bioRxiv biorxiv IN 10_1101-436634 83 44 a a DT 10_1101-436634 83 45 license license NN 10_1101-436634 83 46 to to TO 10_1101-436634 83 47 display display VB 10_1101-436634 83 48 the the DT 10_1101-436634 83 49 preprint preprint NN 10_1101-436634 83 50 in in IN 10_1101-436634 83 51 perpetuity perpetuity NN 10_1101-436634 83 52 . . . 10_1101-436634 84 1 It -PRON- PRP 10_1101-436634 84 2 is be VBZ 10_1101-436634 84 3 made make VBN 10_1101-436634 84 4 available available JJ 10_1101-436634 84 5 under under IN 10_1101-436634 84 6 The the DT 10_1101-436634 84 7 copyright copyright NN 10_1101-436634 84 8 holder holder NN 10_1101-436634 84 9 for for IN 10_1101-436634 84 10 this this DT 10_1101-436634 84 11 preprint preprint NN 10_1101-436634 84 12 ( ( -LRB- 10_1101-436634 84 13 which which WDT 10_1101-436634 84 14 was be VBD 10_1101-436634 84 15 notthis notthis DT 10_1101-436634 84 16 version version NN 10_1101-436634 84 17 posted post VBN 10_1101-436634 84 18 January January NNP 10_1101-436634 84 19 5 5 CD 10_1101-436634 84 20 , , , 10_1101-436634 84 21 2021 2021 CD 10_1101-436634 84 22 . . . 10_1101-436634 84 23 ; ; : 10_1101-436634 84 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 84 25 : : : 10_1101-436634 84 26 bioRxiv biorxiv VB 10_1101-436634 84 27 preprint preprint NN 10_1101-436634 84 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 84 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 84 30 5 5 CD 10_1101-436634 84 31 were be VBD 10_1101-436634 84 32 able able JJ 10_1101-436634 84 33 to to TO 10_1101-436634 84 34 show show VB 10_1101-436634 84 35 that that IN 10_1101-436634 84 36 RegTools RegTools NNP 10_1101-436634 84 37 produces produce VBZ 10_1101-436634 84 38 a a DT 10_1101-436634 84 39 prioritized prioritize VBN 10_1101-436634 84 40 list list NN 10_1101-436634 84 41 of of IN 10_1101-436634 84 42 potential potential JJ 10_1101-436634 84 43 splice splice NN 10_1101-436634 84 44 relevant relevant JJ 10_1101-436634 84 45 variants variant NNS 10_1101-436634 84 46 ( ( -LRB- 10_1101-436634 84 47 Supplementary Supplementary NNP 10_1101-436634 84 48 Figure Figure NNP 10_1101-436634 84 49 2 2 CD 10_1101-436634 84 50 ) ) -RRB- 10_1101-436634 84 51 . . . 10_1101-436634 85 1 Additionally additionally RB 10_1101-436634 85 2 , , , 10_1101-436634 85 3 when when WRB 10_1101-436634 85 4 analyzing analyze VBG 10_1101-436634 85 5 the the DT 10_1101-436634 85 6 junctions junction NNS 10_1101-436634 85 7 within within IN 10_1101-436634 85 8 each each DT 10_1101-436634 85 9 sample sample NN 10_1101-436634 85 10 , , , 10_1101-436634 85 11 we -PRON- PRP 10_1101-436634 85 12 found find VBD 10_1101-436634 85 13 that that IN 10_1101-436634 85 14 junctions junction NNS 10_1101-436634 85 15 present present JJ 10_1101-436634 85 16 in in IN 10_1101-436634 85 17 the the DT 10_1101-436634 85 18 reference reference NN 10_1101-436634 85 19 transcriptome transcriptome VBN 10_1101-436634 85 20 are be VBP 10_1101-436634 85 21 frequently frequently RB 10_1101-436634 85 22 seen see VBN 10_1101-436634 85 23 within within IN 10_1101-436634 85 24 GTEx GTEx NNP 10_1101-436634 85 25 data datum NNS 10_1101-436634 85 26 while while IN 10_1101-436634 85 27 junctions junction NNS 10_1101-436634 85 28 observed observe VBN 10_1101-436634 85 29 from from IN 10_1101-436634 85 30 a a DT 10_1101-436634 85 31 sample sample NN 10_1101-436634 85 32 ’s ’s POS 10_1101-436634 85 33 own own JJ 10_1101-436634 85 34 transcriptome transcriptome DT 10_1101-436634 85 35 data datum NNS 10_1101-436634 85 36 that that WDT 10_1101-436634 85 37 were be VBD 10_1101-436634 85 38 not not RB 10_1101-436634 85 39 present present JJ 10_1101-436634 85 40 in in IN 10_1101-436634 85 41 the the DT 10_1101-436634 85 42 reference reference NN 10_1101-436634 85 43 are be VBP 10_1101-436634 85 44 rarely rarely RB 10_1101-436634 85 45 seen see VBN 10_1101-436634 85 46 within within IN 10_1101-436634 85 47 GTEx GTEx NNS 10_1101-436634 85 48 ( ( -LRB- 10_1101-436634 85 49 Supplementary Supplementary NNP 10_1101-436634 85 50 Figure Figure NNP 10_1101-436634 85 51 3 3 CD 10_1101-436634 85 52 ) ) -RRB- 10_1101-436634 85 53 . . . 10_1101-436634 86 1 235,778 235,778 CD 10_1101-436634 86 2 significant significant JJ 10_1101-436634 86 3 variant variant JJ 10_1101-436634 86 4 junction junction NN 10_1101-436634 86 5 pairings pairing NNS 10_1101-436634 86 6 were be VBD 10_1101-436634 86 7 found find VBN 10_1101-436634 86 8 for for IN 10_1101-436634 86 9 junctions junction NNS 10_1101-436634 86 10 that that WDT 10_1101-436634 86 11 use use VBP 10_1101-436634 86 12 a a DT 10_1101-436634 86 13 known know VBN 10_1101-436634 86 14 donor donor NN 10_1101-436634 86 15 and and CC 10_1101-436634 86 16 novel novel JJ 10_1101-436634 86 17 acceptor acceptor NN 10_1101-436634 86 18 ( ( -LRB- 10_1101-436634 86 19 D D NNP 10_1101-436634 86 20 ) ) -RRB- 10_1101-436634 86 21 , , , 10_1101-436634 86 22 novel novel JJ 10_1101-436634 86 23 donor donor NN 10_1101-436634 86 24 and and CC 10_1101-436634 86 25 known know VBN 10_1101-436634 86 26 acceptor acceptor NN 10_1101-436634 86 27 ( ( -LRB- 10_1101-436634 86 28 A a NN 10_1101-436634 86 29 ) ) -RRB- 10_1101-436634 86 30 , , , 10_1101-436634 86 31 or or CC 10_1101-436634 86 32 novel novel JJ 10_1101-436634 86 33 combination combination NN 10_1101-436634 86 34 of of IN 10_1101-436634 86 35 a a DT 10_1101-436634 86 36 known know VBN 10_1101-436634 86 37 donor donor NN 10_1101-436634 86 38 and and CC 10_1101-436634 86 39 a a DT 10_1101-436634 86 40 known know VBN 10_1101-436634 86 41 acceptor acceptor NN 10_1101-436634 86 42 ( ( -LRB- 10_1101-436634 86 43 NDA NDA NNP 10_1101-436634 86 44 ) ) -RRB- 10_1101-436634 86 45 , , , 10_1101-436634 86 46 with with IN 10_1101-436634 86 47 novel novel NN 10_1101-436634 86 48 here here RB 10_1101-436634 86 49 meaning mean VBG 10_1101-436634 86 50 that that IN 10_1101-436634 86 51 the the DT 10_1101-436634 86 52 junction junction NN 10_1101-436634 86 53 was be VBD 10_1101-436634 86 54 not not RB 10_1101-436634 86 55 found find VBN 10_1101-436634 86 56 in in IN 10_1101-436634 86 57 the the DT 10_1101-436634 86 58 reference reference NN 10_1101-436634 86 59 transcriptome transcriptome VBN 10_1101-436634 86 60 ( ( -LRB- 10_1101-436634 86 61 Methods method NNS 10_1101-436634 86 62 , , , 10_1101-436634 86 63 Figure figure NN 10_1101-436634 86 64 2C 2c NN 10_1101-436634 86 65 , , , 10_1101-436634 86 66 Supplemental Supplemental NNP 10_1101-436634 86 67 Files Files NNP 10_1101-436634 86 68 1 1 CD 10_1101-436634 86 69 and and CC 10_1101-436634 86 70 2 2 CD 10_1101-436634 86 71 ) ) -RRB- 10_1101-436634 86 72 . . . 10_1101-436634 87 1 While while IN 10_1101-436634 87 2 our -PRON- PRP$ 10_1101-436634 87 3 analysis analysis NN 10_1101-436634 87 4 primarily primarily RB 10_1101-436634 87 5 focuses focus VBZ 10_1101-436634 87 6 on on IN 10_1101-436634 87 7 variants variant NNS 10_1101-436634 87 8 in in IN 10_1101-436634 87 9 relation relation NN 10_1101-436634 87 10 to to IN 10_1101-436634 87 11 novel novel JJ 10_1101-436634 87 12 splice splice NN 10_1101-436634 87 13 events event NNS 10_1101-436634 87 14 because because IN 10_1101-436634 87 15 of of IN 10_1101-436634 87 16 the the DT 10_1101-436634 87 17 potential potential JJ 10_1101-436634 87 18 importance importance NN 10_1101-436634 87 19 of of IN 10_1101-436634 87 20 these these DT 10_1101-436634 87 21 events event NNS 10_1101-436634 87 22 within within IN 10_1101-436634 87 23 tumor tumor NN 10_1101-436634 87 24 processes process NNS 10_1101-436634 87 25 , , , 10_1101-436634 87 26 we -PRON- PRP 10_1101-436634 87 27 also also RB 10_1101-436634 87 28 wanted want VBD 10_1101-436634 87 29 to to TO 10_1101-436634 87 30 assess assess VB 10_1101-436634 87 31 how how WRB 10_1101-436634 87 32 often often RB 10_1101-436634 87 33 a a DT 10_1101-436634 87 34 variant variant NN 10_1101-436634 87 35 was be VBD 10_1101-436634 87 36 significantly significantly RB 10_1101-436634 87 37 associated associate VBN 10_1101-436634 87 38 with with IN 10_1101-436634 87 39 a a DT 10_1101-436634 87 40 known know VBN 10_1101-436634 87 41 junction junction NN 10_1101-436634 87 42 . . . 10_1101-436634 88 1 5,157 5,157 CD 10_1101-436634 88 2 variant variant JJ 10_1101-436634 88 3 junction junction NN 10_1101-436634 88 4 pairings pairing NNS 10_1101-436634 88 5 were be VBD 10_1101-436634 88 6 found find VBN 10_1101-436634 88 7 for for IN 10_1101-436634 88 8 junctions junction NNS 10_1101-436634 88 9 known know VBN 10_1101-436634 88 10 to to IN 10_1101-436634 88 11 the the DT 10_1101-436634 88 12 reference reference NN 10_1101-436634 88 13 ( ( -LRB- 10_1101-436634 88 14 DA DA NNP 10_1101-436634 88 15 junctions junction NNS 10_1101-436634 88 16 ) ) -RRB- 10_1101-436634 88 17 ( ( -LRB- 10_1101-436634 88 18 Supplemental Supplemental NNP 10_1101-436634 88 19 Files Files NNP 10_1101-436634 88 20 3 3 CD 10_1101-436634 88 21 and and CC 10_1101-436634 88 22 4 4 CD 10_1101-436634 88 23 ) ) -RRB- 10_1101-436634 88 24 . . . 10_1101-436634 89 1 This this DT 10_1101-436634 89 2 finding finding NN 10_1101-436634 89 3 indicates indicate VBZ 10_1101-436634 89 4 that that IN 10_1101-436634 89 5 while while IN 10_1101-436634 89 6 splice splice NN 10_1101-436634 89 7 variants variant NNS 10_1101-436634 89 8 usually usually RB 10_1101-436634 89 9 result result VBP 10_1101-436634 89 10 in in IN 10_1101-436634 89 11 a a DT 10_1101-436634 89 12 novel novel JJ 10_1101-436634 89 13 junction junction NN 10_1101-436634 89 14 occurring occur VBG 10_1101-436634 89 15 , , , 10_1101-436634 89 16 they -PRON- PRP 10_1101-436634 89 17 sometimes sometimes RB 10_1101-436634 89 18 alter alter VBP 10_1101-436634 89 19 the the DT 10_1101-436634 89 20 expression expression NN 10_1101-436634 89 21 of of IN 10_1101-436634 89 22 known know VBN 10_1101-436634 89 23 junctions junction NNS 10_1101-436634 89 24 . . . 10_1101-436634 90 1 Generally generally RB 10_1101-436634 90 2 , , , 10_1101-436634 90 3 significant significant JJ 10_1101-436634 90 4 events event NNS 10_1101-436634 90 5 were be VBD 10_1101-436634 90 6 evenly evenly RB 10_1101-436634 90 7 split split VBN 10_1101-436634 90 8 among among IN 10_1101-436634 90 9 each each DT 10_1101-436634 90 10 of of IN 10_1101-436634 90 11 the the DT 10_1101-436634 90 12 novel novel JJ 10_1101-436634 90 13 junction junction NN 10_1101-436634 90 14 types type NNS 10_1101-436634 90 15 considered consider VBN 10_1101-436634 90 16 ( ( -LRB- 10_1101-436634 90 17 D d NN 10_1101-436634 90 18 , , , 10_1101-436634 90 19 A a NN 10_1101-436634 90 20 , , , 10_1101-436634 90 21 and and CC 10_1101-436634 90 22 NDA NDA NNP 10_1101-436634 90 23 ) ) -RRB- 10_1101-436634 90 24 . . . 10_1101-436634 91 1 The the DT 10_1101-436634 91 2 number number NN 10_1101-436634 91 3 of of IN 10_1101-436634 91 4 significant significant JJ 10_1101-436634 91 5 events event NNS 10_1101-436634 91 6 increased increase VBD 10_1101-436634 91 7 as as IN 10_1101-436634 91 8 the the DT 10_1101-436634 91 9 splice splice NN 10_1101-436634 91 10 variant variant JJ 10_1101-436634 91 11 window window NN 10_1101-436634 91 12 size size NN 10_1101-436634 91 13 increased increase VBD 10_1101-436634 91 14 , , , 10_1101-436634 91 15 with with IN 10_1101-436634 91 16 both both CC 10_1101-436634 91 17 the the DT 10_1101-436634 91 18 E e NN 10_1101-436634 91 19 and and CC 10_1101-436634 91 20 I -PRON- PRP 10_1101-436634 91 21 results result VBZ 10_1101-436634 91 22 being be VBG 10_1101-436634 91 23 comparable comparable JJ 10_1101-436634 91 24 in in IN 10_1101-436634 91 25 number number NN 10_1101-436634 91 26 . . . 10_1101-436634 92 1 Notably notably RB 10_1101-436634 92 2 , , , 10_1101-436634 92 3 hepatocellular hepatocellular JJ 10_1101-436634 92 4 carcinoma carcinoma NN 10_1101-436634 92 5 ( ( -LRB- 10_1101-436634 92 6 HCC HCC NNP 10_1101-436634 92 7 ) ) -RRB- 10_1101-436634 92 8 was be VBD 10_1101-436634 92 9 the the DT 10_1101-436634 92 10 only only JJ 10_1101-436634 92 11 cohort cohort NN 10_1101-436634 92 12 that that WDT 10_1101-436634 92 13 had have VBD 10_1101-436634 92 14 whole whole JJ 10_1101-436634 92 15 genome genome JJ 10_1101-436634 92 16 sequencing sequencing NN 10_1101-436634 92 17 ( ( -LRB- 10_1101-436634 92 18 WGS WGS NNP 10_1101-436634 92 19 ) ) -RRB- 10_1101-436634 92 20 data datum NNS 10_1101-436634 92 21 available available JJ 10_1101-436634 92 22 and and CC 10_1101-436634 92 23 , , , 10_1101-436634 92 24 as as IN 10_1101-436634 92 25 expected expect VBN 10_1101-436634 92 26 , , , 10_1101-436634 92 27 it -PRON- PRP 10_1101-436634 92 28 exhibited exhibit VBD 10_1101-436634 92 29 a a DT 10_1101-436634 92 30 marked marked JJ 10_1101-436634 92 31 increase increase NN 10_1101-436634 92 32 in in IN 10_1101-436634 92 33 the the DT 10_1101-436634 92 34 number number NN 10_1101-436634 92 35 of of IN 10_1101-436634 92 36 significant significant JJ 10_1101-436634 92 37 events event NNS 10_1101-436634 92 38 for for IN 10_1101-436634 92 39 its -PRON- PRP$ 10_1101-436634 92 40 results result NNS 10_1101-436634 92 41 within within IN 10_1101-436634 92 42 the the DT 10_1101-436634 92 43 “ " `` 10_1101-436634 92 44 I -PRON- PRP 10_1101-436634 92 45 ” " '' 10_1101-436634 92 46 splice splice NN 10_1101-436634 92 47 variant variant JJ 10_1101-436634 92 48 window window NN 10_1101-436634 92 49 . . . 10_1101-436634 93 1 This this DT 10_1101-436634 93 2 observation observation NN 10_1101-436634 93 3 highlights highlight VBZ 10_1101-436634 93 4 the the DT 10_1101-436634 93 5 low low JJ 10_1101-436634 93 6 sequence sequence NN 10_1101-436634 93 7 coverage coverage NN 10_1101-436634 93 8 of of IN 10_1101-436634 93 9 intronic intronic JJ 10_1101-436634 93 10 regions region NNS 10_1101-436634 93 11 that that WDT 10_1101-436634 93 12 occurs occur VBZ 10_1101-436634 93 13 with with IN 10_1101-436634 93 14 WES WES NNP 10_1101-436634 93 15 which which WDT 10_1101-436634 93 16 subsequently subsequently RB 10_1101-436634 93 17 leads lead VBZ 10_1101-436634 93 18 to to IN 10_1101-436634 93 19 underpowered underpowered JJ 10_1101-436634 93 20 discovery discovery NN 10_1101-436634 93 21 of of IN 10_1101-436634 93 22 potential potential JJ 10_1101-436634 93 23 splice splice NN 10_1101-436634 93 24 altering alter VBG 10_1101-436634 93 25 variants variant NNS 10_1101-436634 93 26 within within IN 10_1101-436634 93 27 introns intron NNS 10_1101-436634 93 28 . . . 10_1101-436634 94 1 Variants variant NNS 10_1101-436634 94 2 were be VBD 10_1101-436634 94 3 analyzed analyze VBN 10_1101-436634 94 4 across across IN 10_1101-436634 94 5 tumor tumor NN 10_1101-436634 94 6 types type NNS 10_1101-436634 94 7 for for IN 10_1101-436634 94 8 how how WRB 10_1101-436634 94 9 often often RB 10_1101-436634 94 10 they -PRON- PRP 10_1101-436634 94 11 result result VBP 10_1101-436634 94 12 in in IN 10_1101-436634 94 13 either either CC 10_1101-436634 94 14 a a DT 10_1101-436634 94 15 single single JJ 10_1101-436634 94 16 or or CC 10_1101-436634 94 17 multiple multiple JJ 10_1101-436634 94 18 novel novel JJ 10_1101-436634 94 19 junctions junction NNS 10_1101-436634 94 20 ( ( -LRB- 10_1101-436634 94 21 Figure Figure NNP 10_1101-436634 94 22 3A 3A NNP 10_1101-436634 94 23 ) ) -RRB- 10_1101-436634 94 24 . . . 10_1101-436634 95 1 While while IN 10_1101-436634 95 2 a a DT 10_1101-436634 95 3 single single JJ 10_1101-436634 95 4 variant variant NN 10_1101-436634 95 5 resulting resulting NN 10_1101-436634 95 6 in in IN 10_1101-436634 95 7 a a DT 10_1101-436634 95 8 single single JJ 10_1101-436634 95 9 novel novel NN 10_1101-436634 95 10 junction junction NN 10_1101-436634 95 11 is be VBZ 10_1101-436634 95 12 most most RBS 10_1101-436634 95 13 commonly commonly RB 10_1101-436634 95 14 observed observe VBN 10_1101-436634 95 15 ( ( -LRB- 10_1101-436634 95 16 72.27 72.27 CD 10_1101-436634 95 17 - - HYPH 10_1101-436634 95 18 83.78 83.78 CD 10_1101-436634 95 19 % % NN 10_1101-436634 95 20 ) ) -RRB- 10_1101-436634 95 21 , , , 10_1101-436634 95 22 a a DT 10_1101-436634 95 23 single single JJ 10_1101-436634 95 24 variant variant NN 10_1101-436634 95 25 also also RB 10_1101-436634 95 26 commonly commonly RB 10_1101-436634 95 27 results result VBZ 10_1101-436634 95 28 in in IN 10_1101-436634 95 29 multiple multiple JJ 10_1101-436634 95 30 junctions junction NNS 10_1101-436634 95 31 being be VBG 10_1101-436634 95 32 created create VBN 10_1101-436634 95 33 , , , 10_1101-436634 95 34 either either DT 10_1101-436634 95 35 of of IN 10_1101-436634 95 36 the the DT 10_1101-436634 95 37 same same JJ 10_1101-436634 95 38 type type NN 10_1101-436634 95 39 ( ( -LRB- 10_1101-436634 95 40 6.56 6.56 CD 10_1101-436634 95 41 - - SYM 10_1101-436634 95 42 10.94 10.94 CD 10_1101-436634 95 43 % % NN 10_1101-436634 95 44 ) ) -RRB- 10_1101-436634 95 45 or or CC 10_1101-436634 95 46 of of IN 10_1101-436634 95 47 different different JJ 10_1101-436634 95 48 types type NNS 10_1101-436634 95 49 ( ( -LRB- 10_1101-436634 95 50 9.66- 9.66- CD 10_1101-436634 95 51 16.79 16.79 CD 10_1101-436634 95 52 % % NN 10_1101-436634 95 53 ) ) -RRB- 10_1101-436634 95 54 ( ( -LRB- 10_1101-436634 95 55 Figure figure NN 10_1101-436634 95 56 3B 3b NN 10_1101-436634 95 57 ) ) -RRB- 10_1101-436634 95 58 . . . 10_1101-436634 96 1 Variants variant NNS 10_1101-436634 96 2 that that WDT 10_1101-436634 96 3 are be VBP 10_1101-436634 96 4 associated associate VBN 10_1101-436634 96 5 with with IN 10_1101-436634 96 6 multiple multiple JJ 10_1101-436634 96 7 novel novel JJ 10_1101-436634 96 8 junctions junction NNS 10_1101-436634 96 9 of of IN 10_1101-436634 96 10 different different JJ 10_1101-436634 96 11 types type NNS 10_1101-436634 96 12 were be VBD 10_1101-436634 96 13 further further RB 10_1101-436634 96 14 investigated investigate VBN 10_1101-436634 96 15 to to TO 10_1101-436634 96 16 identify identify VB 10_1101-436634 96 17 how how WRB 10_1101-436634 96 18 often often RB 10_1101-436634 96 19 a a DT 10_1101-436634 96 20 particular particular JJ 10_1101-436634 96 21 junction junction NN 10_1101-436634 96 22 type type NN 10_1101-436634 96 23 occurred occur VBD 10_1101-436634 96 24 with with IN 10_1101-436634 96 25 another another DT 10_1101-436634 96 26 ( ( -LRB- 10_1101-436634 96 27 Figure figure NN 10_1101-436634 96 28 3C 3c NN 10_1101-436634 96 29 ) ) -RRB- 10_1101-436634 96 30 . . . 10_1101-436634 97 1 Most most RBS 10_1101-436634 97 2 commonly commonly RB 10_1101-436634 97 3 , , , 10_1101-436634 97 4 we -PRON- PRP 10_1101-436634 97 5 observed observe VBD 10_1101-436634 97 6 an an DT 10_1101-436634 97 7 alternate alternate JJ 10_1101-436634 97 8 donor donor NN 10_1101-436634 97 9 or or CC 10_1101-436634 97 10 acceptor acceptor NN 10_1101-436634 97 11 site site NN 10_1101-436634 97 12 being be VBG 10_1101-436634 97 13 used use VBN 10_1101-436634 97 14 in in IN 10_1101-436634 97 15 conjunction conjunction NN 10_1101-436634 97 16 with with IN 10_1101-436634 97 17 an an DT 10_1101-436634 97 18 exon exon JJ 10_1101-436634 97 19 skipping skipping NN 10_1101-436634 97 20 event event NN 10_1101-436634 97 21 . . . 10_1101-436634 98 1 These these DT 10_1101-436634 98 2 events event NNS 10_1101-436634 98 3 were be VBD 10_1101-436634 98 4 particularly particularly RB 10_1101-436634 98 5 common common JJ 10_1101-436634 98 6 within within IN 10_1101-436634 98 7 the the DT 10_1101-436634 98 8 default default NN 10_1101-436634 98 9 window window NN 10_1101-436634 98 10 ( ( -LRB- 10_1101-436634 98 11 2 2 CD 10_1101-436634 98 12 intronic intronic JJ 10_1101-436634 98 13 bases basis NNS 10_1101-436634 98 14 or or CC 10_1101-436634 98 15 3 3 CD 10_1101-436634 98 16 exonic exonic JJ 10_1101-436634 98 17 bases basis NNS 10_1101-436634 98 18 from from IN 10_1101-436634 98 19 the the DT 10_1101-436634 98 20 exon exon JJ 10_1101-436634 98 21 edge edge NN 10_1101-436634 98 22 ) ) -RRB- 10_1101-436634 98 23 , , , 10_1101-436634 98 24 as as IN 10_1101-436634 98 25 a a DT 10_1101-436634 98 26 SNV SNV NNP 10_1101-436634 98 27 or or CC 10_1101-436634 98 28 indel indel NN 10_1101-436634 98 29 within within IN 10_1101-436634 98 30 these these DT 10_1101-436634 98 31 positions position NNS 10_1101-436634 98 32 has have VBZ 10_1101-436634 98 33 a a DT 10_1101-436634 98 34 high high JJ 10_1101-436634 98 35 probability probability NN 10_1101-436634 98 36 of of IN 10_1101-436634 98 37 disrupting disrupt VBG 10_1101-436634 98 38 the the DT 10_1101-436634 98 39 natural natural JJ 10_1101-436634 98 40 splice splice NN 10_1101-436634 98 41 site site NN 10_1101-436634 98 42 , , , 10_1101-436634 98 43 thus thus RB 10_1101-436634 98 44 causing cause VBG 10_1101-436634 98 45 the the DT 10_1101-436634 98 46 splicing splice VBG 10_1101-436634 98 47 machinery machinery NN 10_1101-436634 98 48 to to TO 10_1101-436634 98 49 use use VB 10_1101-436634 98 50 a a DT 10_1101-436634 98 51 cryptic cryptic JJ 10_1101-436634 98 52 splice splice NN 10_1101-436634 98 53 site site NN 10_1101-436634 98 54 nearby nearby RB 10_1101-436634 98 55 or or CC 10_1101-436634 98 56 skip skip VBZ 10_1101-436634 98 57 the the DT 10_1101-436634 98 58 splice splice NN 10_1101-436634 98 59 site site NN 10_1101-436634 98 60 entirely entirely RB 10_1101-436634 98 61 . . . 10_1101-436634 99 1 The the DT 10_1101-436634 99 2 next next JJ 10_1101-436634 99 3 most most RBS 10_1101-436634 99 4 common common JJ 10_1101-436634 99 5 event event NN 10_1101-436634 99 6 was be VBD 10_1101-436634 99 7 an an DT 10_1101-436634 99 8 alternate alternate JJ 10_1101-436634 99 9 donor donor NN 10_1101-436634 99 10 site site NN 10_1101-436634 99 11 and and CC 10_1101-436634 99 12 an an DT 10_1101-436634 99 13 alternate alternate JJ 10_1101-436634 99 14 acceptor acceptor NN 10_1101-436634 99 15 site site NN 10_1101-436634 99 16 both both DT 10_1101-436634 99 17 being be VBG 10_1101-436634 99 18 used use VBN 10_1101-436634 99 19 as as IN 10_1101-436634 99 20 the the DT 10_1101-436634 99 21 result result NN 10_1101-436634 99 22 of of IN 10_1101-436634 99 23 a a DT 10_1101-436634 99 24 single single JJ 10_1101-436634 99 25 variant variant NN 10_1101-436634 99 26 . . . 10_1101-436634 100 1 The the DT 10_1101-436634 100 2 combination combination NN 10_1101-436634 100 3 of of IN 10_1101-436634 100 4 a a DT 10_1101-436634 100 5 novel novel JJ 10_1101-436634 100 6 acceptor acceptor NN 10_1101-436634 100 7 site site NN 10_1101-436634 100 8 and and CC 10_1101-436634 100 9 novel novel JJ 10_1101-436634 100 10 donor donor NN 10_1101-436634 100 11 site site NN 10_1101-436634 100 12 being be VBG 10_1101-436634 100 13 used use VBN 10_1101-436634 100 14 in in IN 10_1101-436634 100 15 conjunction conjunction NN 10_1101-436634 100 16 with with IN 10_1101-436634 100 17 an an DT 10_1101-436634 100 18 exon exon NN 10_1101-436634 100 19 - - HYPH 10_1101-436634 100 20 skipping skip VBG 10_1101-436634 100 21 event event NN 10_1101-436634 100 22 occurred occur VBD 10_1101-436634 100 23 the the DT 10_1101-436634 100 24 least least JJS 10_1101-436634 100 25 and and CC 10_1101-436634 100 26 occurrence occurrence NN 10_1101-436634 100 27 of of IN 10_1101-436634 100 28 this this DT 10_1101-436634 100 29 type type NN 10_1101-436634 100 30 of of IN 10_1101-436634 100 31 event event NN 10_1101-436634 100 32 remains remain VBZ 10_1101-436634 100 33 fairly fairly RB 10_1101-436634 100 34 low low JJ 10_1101-436634 100 35 , , , 10_1101-436634 100 36 even even RB 10_1101-436634 100 37 as as IN 10_1101-436634 100 38 the the DT 10_1101-436634 100 39 search search NN 10_1101-436634 100 40 space space NN 10_1101-436634 100 41 increases increase VBZ 10_1101-436634 100 42 within within IN 10_1101-436634 100 43 the the DT 10_1101-436634 100 44 larger large JJR 10_1101-436634 100 45 splice splice NN 10_1101-436634 100 46 variant variant JJ 10_1101-436634 100 47 windows window NNS 10_1101-436634 100 48 . . . 10_1101-436634 101 1 This this DT 10_1101-436634 101 2 finding finding NN 10_1101-436634 101 3 indicates indicate VBZ 10_1101-436634 101 4 the the DT 10_1101-436634 101 5 low low JJ 10_1101-436634 101 6 likelihood likelihood NN 10_1101-436634 101 7 of of IN 10_1101-436634 101 8 a a DT 10_1101-436634 101 9 single single JJ 10_1101-436634 101 10 variant variant NN 10_1101-436634 101 11 resulting resulting NN 10_1101-436634 101 12 in in IN 10_1101-436634 101 13 simultaneous simultaneous JJ 10_1101-436634 101 14 disruption disruption NN 10_1101-436634 101 15 of of IN 10_1101-436634 101 16 a a DT 10_1101-436634 101 17 splice splice NN 10_1101-436634 101 18 acceptor acceptor NN 10_1101-436634 101 19 and and CC 10_1101-436634 101 20 donor donor NN 10_1101-436634 101 21 as as RB 10_1101-436634 101 22 well well RB 10_1101-436634 101 23 as as IN 10_1101-436634 101 24 complete complete JJ 10_1101-436634 101 25 skipping skipping NN 10_1101-436634 101 26 of of IN 10_1101-436634 101 27 an an DT 10_1101-436634 101 28 exon exon NN 10_1101-436634 101 29 . . . 10_1101-436634 102 1 Overall overall RB 10_1101-436634 102 2 , , , 10_1101-436634 102 3 this this DT 10_1101-436634 102 4 analysis analysis NN 10_1101-436634 102 5 highlights highlight VBZ 10_1101-436634 102 6 that that IN 10_1101-436634 102 7 there there EX 10_1101-436634 102 8 is be VBZ 10_1101-436634 102 9 evidence evidence NN 10_1101-436634 102 10 that that IN 10_1101-436634 102 11 a a DT 10_1101-436634 102 12 single single JJ 10_1101-436634 102 13 variant variant NN 10_1101-436634 102 14 can can MD 10_1101-436634 102 15 lead lead VB 10_1101-436634 102 16 to to IN 10_1101-436634 102 17 multiple multiple JJ 10_1101-436634 102 18 novel novel JJ 10_1101-436634 102 19 junctions junction NNS 10_1101-436634 102 20 being be VBG 10_1101-436634 102 21 expressed express VBN 10_1101-436634 102 22 . . . 10_1101-436634 103 1 Tools tool NNS 10_1101-436634 103 2 that that WDT 10_1101-436634 103 3 only only RB 10_1101-436634 103 4 allow allow VBP 10_1101-436634 103 5 for for IN 10_1101-436634 103 6 a a DT 10_1101-436634 103 7 single single JJ 10_1101-436634 103 8 junction junction NN 10_1101-436634 103 9 to to TO 10_1101-436634 103 10 be be VB 10_1101-436634 103 11 predicted predict VBN 10_1101-436634 103 12 or or CC 10_1101-436634 103 13 .CC .CC : 10_1101-436634 103 14 - - HYPH 10_1101-436634 103 15 BY by IN 10_1101-436634 103 16 - - HYPH 10_1101-436634 103 17 NC NC NNP 10_1101-436634 103 18 - - HYPH 10_1101-436634 103 19 ND ND NNP 10_1101-436634 103 20 4.0 4.0 CD 10_1101-436634 103 21 International international JJ 10_1101-436634 103 22 licensea licensea NNS 10_1101-436634 103 23 certified certify VBN 10_1101-436634 103 24 by by IN 10_1101-436634 103 25 peer peer NN 10_1101-436634 103 26 review review NN 10_1101-436634 103 27 ) ) -RRB- 10_1101-436634 103 28 is be VBZ 10_1101-436634 103 29 the the DT 10_1101-436634 103 30 author author NN 10_1101-436634 103 31 / / SYM 10_1101-436634 103 32 funder funder NN 10_1101-436634 103 33 , , , 10_1101-436634 103 34 who who WP 10_1101-436634 103 35 has have VBZ 10_1101-436634 103 36 granted grant VBN 10_1101-436634 103 37 bioRxiv biorxiv IN 10_1101-436634 103 38 a a DT 10_1101-436634 103 39 license license NN 10_1101-436634 103 40 to to TO 10_1101-436634 103 41 display display VB 10_1101-436634 103 42 the the DT 10_1101-436634 103 43 preprint preprint NN 10_1101-436634 103 44 in in IN 10_1101-436634 103 45 perpetuity perpetuity NN 10_1101-436634 103 46 . . . 10_1101-436634 104 1 It -PRON- PRP 10_1101-436634 104 2 is be VBZ 10_1101-436634 104 3 made make VBN 10_1101-436634 104 4 available available JJ 10_1101-436634 104 5 under under IN 10_1101-436634 104 6 The the DT 10_1101-436634 104 7 copyright copyright NN 10_1101-436634 104 8 holder holder NN 10_1101-436634 104 9 for for IN 10_1101-436634 104 10 this this DT 10_1101-436634 104 11 preprint preprint NN 10_1101-436634 104 12 ( ( -LRB- 10_1101-436634 104 13 which which WDT 10_1101-436634 104 14 was be VBD 10_1101-436634 104 15 notthis notthis DT 10_1101-436634 104 16 version version NN 10_1101-436634 104 17 posted post VBN 10_1101-436634 104 18 January January NNP 10_1101-436634 104 19 5 5 CD 10_1101-436634 104 20 , , , 10_1101-436634 104 21 2021 2021 CD 10_1101-436634 104 22 . . . 10_1101-436634 104 23 ; ; : 10_1101-436634 104 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 104 25 : : : 10_1101-436634 104 26 bioRxiv biorxiv VB 10_1101-436634 104 27 preprint preprint NN 10_1101-436634 104 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 104 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NNP 10_1101-436634 104 30 6 6 CD 10_1101-436634 104 31 associated associate VBN 10_1101-436634 104 32 with with IN 10_1101-436634 104 33 a a DT 10_1101-436634 104 34 variant variant NN 10_1101-436634 104 35 therefore therefore RB 10_1101-436634 104 36 may may MD 10_1101-436634 104 37 not not RB 10_1101-436634 104 38 be be VB 10_1101-436634 104 39 completely completely RB 10_1101-436634 104 40 describing describe VBG 10_1101-436634 104 41 the the DT 10_1101-436634 104 42 effect effect NN 10_1101-436634 104 43 of of IN 10_1101-436634 104 44 the the DT 10_1101-436634 104 45 variant variant NN 10_1101-436634 104 46 in in IN 10_1101-436634 104 47 question question NN 10_1101-436634 104 48 in in IN 10_1101-436634 104 49 up up IN 10_1101-436634 104 50 to to IN 10_1101-436634 104 51 ~27 ~27 . 10_1101-436634 104 52 % % NN 10_1101-436634 104 53 of of IN 10_1101-436634 104 54 cases case NNS 10_1101-436634 104 55 . . . 10_1101-436634 105 1 RegTools RegTools NNP 10_1101-436634 105 2 identifies identifies NNP 10_1101-436634 105 3 splice splice NN 10_1101-436634 105 4 altering alter VBG 10_1101-436634 105 5 variants variant NNS 10_1101-436634 105 6 missed miss VBN 10_1101-436634 105 7 by by IN 10_1101-436634 105 8 other other JJ 10_1101-436634 105 9 splice splice NN 10_1101-436634 105 10 variant variant JJ 10_1101-436634 105 11 predictors predictor NNS 10_1101-436634 105 12 and and CC 10_1101-436634 105 13 annotators annotator NNS 10_1101-436634 105 14 To to TO 10_1101-436634 105 15 evaluate evaluate VB 10_1101-436634 105 16 the the DT 10_1101-436634 105 17 performance performance NN 10_1101-436634 105 18 of of IN 10_1101-436634 105 19 RegTools RegTools NNP 10_1101-436634 105 20 , , , 10_1101-436634 105 21 we -PRON- PRP 10_1101-436634 105 22 compared compare VBD 10_1101-436634 105 23 our -PRON- PRP$ 10_1101-436634 105 24 results result NNS 10_1101-436634 105 25 to to IN 10_1101-436634 105 26 those those DT 10_1101-436634 105 27 of of IN 10_1101-436634 105 28 SAVNet SAVNet NNP 10_1101-436634 105 29 , , , 10_1101-436634 105 30 MiSplice MiSplice NNP 10_1101-436634 105 31 , , , 10_1101-436634 105 32 Veridical Veridical NNP 10_1101-436634 105 33 , , , 10_1101-436634 105 34 VEP VEP NNP 10_1101-436634 105 35 , , , 10_1101-436634 105 36 and and CC 10_1101-436634 105 37 SpliceAI13,20,21,23,25 SpliceAI13,20,21,23,25 NNP 10_1101-436634 105 38 . . . 10_1101-436634 106 1 These these DT 10_1101-436634 106 2 tools tool NNS 10_1101-436634 106 3 vary vary VBP 10_1101-436634 106 4 in in IN 10_1101-436634 106 5 their -PRON- PRP$ 10_1101-436634 106 6 inputs input NNS 10_1101-436634 106 7 and and CC 10_1101-436634 106 8 methodology methodology NN 10_1101-436634 106 9 for for IN 10_1101-436634 106 10 identifying identify VBG 10_1101-436634 106 11 splice splice NN 10_1101-436634 106 12 altering alter VBG 10_1101-436634 106 13 variants variant NNS 10_1101-436634 106 14 ( ( -LRB- 10_1101-436634 106 15 Figure Figure NNP 10_1101-436634 106 16 4A 4A NNP 10_1101-436634 106 17 ) ) -RRB- 10_1101-436634 106 18 . . . 10_1101-436634 107 1 Both both CC 10_1101-436634 107 2 VEP VEP NNP 10_1101-436634 107 3 and and CC 10_1101-436634 107 4 SpliceAI SpliceAI VBN 10_1101-436634 107 5 only only RB 10_1101-436634 107 6 consider consider VB 10_1101-436634 107 7 information information NN 10_1101-436634 107 8 about about IN 10_1101-436634 107 9 the the DT 10_1101-436634 107 10 variant variant JJ 10_1101-436634 107 11 and and CC 10_1101-436634 107 12 its -PRON- PRP$ 10_1101-436634 107 13 genomic genomic JJ 10_1101-436634 107 14 sequence sequence NN 10_1101-436634 107 15 context context NN 10_1101-436634 107 16 and and CC 10_1101-436634 107 17 do do VBP 10_1101-436634 107 18 not not RB 10_1101-436634 107 19 consider consider VB 10_1101-436634 107 20 information information NN 10_1101-436634 107 21 from from IN 10_1101-436634 107 22 a a DT 10_1101-436634 107 23 sample sample NN 10_1101-436634 107 24 ’s ’s , 10_1101-436634 107 25 transcriptome transcriptome JJS 10_1101-436634 107 26 . . . 10_1101-436634 108 1 A a DT 10_1101-436634 108 2 variant variant NN 10_1101-436634 108 3 is be VBZ 10_1101-436634 108 4 considered consider VBN 10_1101-436634 108 5 to to TO 10_1101-436634 108 6 be be VB 10_1101-436634 108 7 splice splice NN 10_1101-436634 108 8 relevant relevant JJ 10_1101-436634 108 9 according accord VBG 10_1101-436634 108 10 to to IN 10_1101-436634 108 11 VEP VEP NNP 10_1101-436634 108 12 if if IN 10_1101-436634 108 13 it -PRON- PRP 10_1101-436634 108 14 occurs occur VBZ 10_1101-436634 108 15 within within IN 10_1101-436634 108 16 1 1 CD 10_1101-436634 108 17 - - SYM 10_1101-436634 108 18 3 3 CD 10_1101-436634 108 19 bases basis NNS 10_1101-436634 108 20 on on IN 10_1101-436634 108 21 the the DT 10_1101-436634 108 22 exonic exonic JJ 10_1101-436634 108 23 side side NN 10_1101-436634 108 24 or or CC 10_1101-436634 108 25 1 1 CD 10_1101-436634 108 26 - - SYM 10_1101-436634 108 27 8 8 CD 10_1101-436634 108 28 bases basis NNS 10_1101-436634 108 29 on on IN 10_1101-436634 108 30 the the DT 10_1101-436634 108 31 intronic intronic JJ 10_1101-436634 108 32 side side NN 10_1101-436634 108 33 of of IN 10_1101-436634 108 34 a a DT 10_1101-436634 108 35 splice splice NN 10_1101-436634 108 36 site site NN 10_1101-436634 108 37 . . . 10_1101-436634 109 1 SpliceAI SpliceAI VBN 10_1101-436634 109 2 does do VBZ 10_1101-436634 109 3 not not RB 10_1101-436634 109 4 have have VB 10_1101-436634 109 5 restrictions restriction NNS 10_1101-436634 109 6 on on IN 10_1101-436634 109 7 where where WRB 10_1101-436634 109 8 the the DT 10_1101-436634 109 9 variant variant NN 10_1101-436634 109 10 can can MD 10_1101-436634 109 11 occur occur VB 10_1101-436634 109 12 in in IN 10_1101-436634 109 13 relation relation NN 10_1101-436634 109 14 to to IN 10_1101-436634 109 15 the the DT 10_1101-436634 109 16 splice splice NN 10_1101-436634 109 17 site site NN 10_1101-436634 109 18 but but CC 10_1101-436634 109 19 by by IN 10_1101-436634 109 20 default default NN 10_1101-436634 109 21 , , , 10_1101-436634 109 22 it -PRON- PRP 10_1101-436634 109 23 predicts predict VBZ 10_1101-436634 109 24 one one CD 10_1101-436634 109 25 new new JJ 10_1101-436634 109 26 donor donor NN 10_1101-436634 109 27 and and CC 10_1101-436634 109 28 acceptor acceptor NN 10_1101-436634 109 29 site site NN 10_1101-436634 109 30 within within IN 10_1101-436634 109 31 50 50 CD 10_1101-436634 109 32 bp bp NN 10_1101-436634 109 33 of of IN 10_1101-436634 109 34 the the DT 10_1101-436634 109 35 variant variant NN 10_1101-436634 109 36 , , , 10_1101-436634 109 37 based base VBN 10_1101-436634 109 38 on on IN 10_1101-436634 109 39 reference reference NN 10_1101-436634 109 40 transcript transcript NN 10_1101-436634 109 41 sequences sequence NNS 10_1101-436634 109 42 from from IN 10_1101-436634 109 43 GENCODE GENCODE NNP 10_1101-436634 109 44 . . . 10_1101-436634 110 1 Like like IN 10_1101-436634 110 2 RegTools RegTools NNP 10_1101-436634 110 3 , , , 10_1101-436634 110 4 SAVNet SAVNet NNP 10_1101-436634 110 5 , , , 10_1101-436634 110 6 MiSplice MiSplice NNP 10_1101-436634 110 7 , , , 10_1101-436634 110 8 and and CC 10_1101-436634 110 9 Veridical veridical JJ 10_1101-436634 110 10 integrate integrate VB 10_1101-436634 110 11 genomic genomic JJ 10_1101-436634 110 12 and and CC 10_1101-436634 110 13 transcriptomic transcriptomic JJ 10_1101-436634 110 14 data datum NNS 10_1101-436634 110 15 in in IN 10_1101-436634 110 16 order order NN 10_1101-436634 110 17 to to TO 10_1101-436634 110 18 identify identify VB 10_1101-436634 110 19 splice splice NN 10_1101-436634 110 20 altering altering NN 10_1101-436634 110 21 variants variant NNS 10_1101-436634 110 22 . . . 10_1101-436634 111 1 MiSplice MiSplice NNP 10_1101-436634 111 2 only only RB 10_1101-436634 111 3 considers consider VBZ 10_1101-436634 111 4 junctions junction NNS 10_1101-436634 111 5 that that WDT 10_1101-436634 111 6 occur occur VBP 10_1101-436634 111 7 within within IN 10_1101-436634 111 8 20 20 CD 10_1101-436634 111 9 bp bp NN 10_1101-436634 111 10 of of IN 10_1101-436634 111 11 the the DT 10_1101-436634 111 12 variant variant NN 10_1101-436634 111 13 . . . 10_1101-436634 112 1 Additionally additionally RB 10_1101-436634 112 2 , , , 10_1101-436634 112 3 SAVNet SAVNet NNP 10_1101-436634 112 4 , , , 10_1101-436634 112 5 MiSplice MiSplice NNP 10_1101-436634 112 6 , , , 10_1101-436634 112 7 and and CC 10_1101-436634 112 8 Veridical veridical JJ 10_1101-436634 112 9 filter filter NN 10_1101-436634 112 10 out out RP 10_1101-436634 112 11 any any DT 10_1101-436634 112 12 transcripts transcript NNS 10_1101-436634 112 13 found find VBN 10_1101-436634 112 14 within within IN 10_1101-436634 112 15 the the DT 10_1101-436634 112 16 reference reference NN 10_1101-436634 112 17 transcriptome transcriptome VBN 10_1101-436634 112 18 . . . 10_1101-436634 113 1 SAVNet SAVNet NNP 10_1101-436634 113 2 , , , 10_1101-436634 113 3 MiSplice MiSplice NNP 10_1101-436634 113 4 , , , 10_1101-436634 113 5 and and CC 10_1101-436634 113 6 Veridical veridical JJ 10_1101-436634 113 7 employ employ JJ 10_1101-436634 113 8 different different JJ 10_1101-436634 113 9 statistical statistical JJ 10_1101-436634 113 10 methods method NNS 10_1101-436634 113 11 for for IN 10_1101-436634 113 12 the the DT 10_1101-436634 113 13 identification identification NN 10_1101-436634 113 14 of of IN 10_1101-436634 113 15 splice splice NN 10_1101-436634 113 16 altering altering NN 10_1101-436634 113 17 variants variant NNS 10_1101-436634 113 18 . . . 10_1101-436634 114 1 In in IN 10_1101-436634 114 2 contrast contrast NN 10_1101-436634 114 3 to to IN 10_1101-436634 114 4 RegTools RegTools NNP 10_1101-436634 114 5 , , , 10_1101-436634 114 6 none none NN 10_1101-436634 114 7 of of IN 10_1101-436634 114 8 the the DT 10_1101-436634 114 9 mentioned mention VBN 10_1101-436634 114 10 tools tool NNS 10_1101-436634 114 11 allow allow VBP 10_1101-436634 114 12 the the DT 10_1101-436634 114 13 user user NN 10_1101-436634 114 14 to to TO 10_1101-436634 114 15 set set VB 10_1101-436634 114 16 a a DT 10_1101-436634 114 17 custom custom NN 10_1101-436634 114 18 window window NN 10_1101-436634 114 19 in in IN 10_1101-436634 114 20 which which WDT 10_1101-436634 114 21 they -PRON- PRP 10_1101-436634 114 22 wish wish VBP 10_1101-436634 114 23 to to TO 10_1101-436634 114 24 focus focus VB 10_1101-436634 114 25 splice splice NN 10_1101-436634 114 26 altering alter VBG 10_1101-436634 114 27 variant variant JJ 10_1101-436634 114 28 discovery discovery NN 10_1101-436634 114 29 ( ( -LRB- 10_1101-436634 114 30 e.g. e.g. RB 10_1101-436634 115 1 around around IN 10_1101-436634 115 2 the the DT 10_1101-436634 115 3 splice splice NN 10_1101-436634 115 4 site site NN 10_1101-436634 115 5 , , , 10_1101-436634 115 6 all all DT 10_1101-436634 115 7 exonic exonic JJ 10_1101-436634 115 8 variants variant NNS 10_1101-436634 115 9 , , , 10_1101-436634 115 10 etc etc FW 10_1101-436634 115 11 . . . 10_1101-436634 115 12 ) ) -RRB- 10_1101-436634 115 13 . . . 10_1101-436634 116 1 These these DT 10_1101-436634 116 2 tools tool NNS 10_1101-436634 116 3 have have VBP 10_1101-436634 116 4 different different JJ 10_1101-436634 116 5 levels level NNS 10_1101-436634 116 6 of of IN 10_1101-436634 116 7 code code NN 10_1101-436634 116 8 availability availability NN 10_1101-436634 116 9 . . . 10_1101-436634 117 1 MiSplice MiSplice NNP 10_1101-436634 117 2 is be VBZ 10_1101-436634 117 3 available available JJ 10_1101-436634 117 4 via via IN 10_1101-436634 117 5 GitHub GitHub NNP 10_1101-436634 117 6 as as IN 10_1101-436634 117 7 a a DT 10_1101-436634 117 8 collection collection NN 10_1101-436634 117 9 of of IN 10_1101-436634 117 10 Perl Perl NNP 10_1101-436634 117 11 scripts script NNS 10_1101-436634 117 12 that that WDT 10_1101-436634 117 13 are be VBP 10_1101-436634 117 14 built build VBN 10_1101-436634 117 15 to to TO 10_1101-436634 117 16 run run VB 10_1101-436634 117 17 via via IN 10_1101-436634 117 18 Load Load NNP 10_1101-436634 117 19 Sharing Sharing NNP 10_1101-436634 117 20 Facility Facility NNP 10_1101-436634 117 21 ( ( -LRB- 10_1101-436634 117 22 LSF LSF NNP 10_1101-436634 117 23 ) ) -RRB- 10_1101-436634 117 24 job job NN 10_1101-436634 117 25 scheduling scheduling NN 10_1101-436634 117 26 . . . 10_1101-436634 118 1 To to TO 10_1101-436634 118 2 run run VB 10_1101-436634 118 3 MiSplice MiSplice NNP 10_1101-436634 118 4 without without IN 10_1101-436634 118 5 an an DT 10_1101-436634 118 6 LSF LSF NNP 10_1101-436634 118 7 cluster cluster NN 10_1101-436634 118 8 , , , 10_1101-436634 118 9 the the DT 10_1101-436634 118 10 authors author NNS 10_1101-436634 118 11 mention mention VBP 10_1101-436634 118 12 code code NN 10_1101-436634 118 13 changes change NNS 10_1101-436634 118 14 are be VBP 10_1101-436634 118 15 required require VBN 10_1101-436634 118 16 . . . 10_1101-436634 119 1 Veridical veridical JJ 10_1101-436634 119 2 is be VBZ 10_1101-436634 119 3 available available JJ 10_1101-436634 119 4 via via IN 10_1101-436634 119 5 a a DT 10_1101-436634 119 6 subscription subscription NN 10_1101-436634 119 7 through through IN 10_1101-436634 119 8 CytoGnomix CytoGnomix NNP 10_1101-436634 119 9 ’s ’s POS 10_1101-436634 119 10 MutationForecaster MutationForecaster NNP 10_1101-436634 119 11 . . . 10_1101-436634 120 1 Similar similar JJ 10_1101-436634 120 2 to to IN 10_1101-436634 120 3 RegTools RegTools NNP 10_1101-436634 120 4 , , , 10_1101-436634 120 5 SAVNet SAVNet NNP 10_1101-436634 120 6 is be VBZ 10_1101-436634 120 7 available available JJ 10_1101-436634 120 8 via via IN 10_1101-436634 120 9 GitHub GitHub NNP 10_1101-436634 120 10 or or CC 10_1101-436634 120 11 through through IN 10_1101-436634 120 12 a a DT 10_1101-436634 120 13 Docker Docker NNP 10_1101-436634 120 14 image image NN 10_1101-436634 120 15 . . . 10_1101-436634 121 1 However however RB 10_1101-436634 121 2 , , , 10_1101-436634 121 3 SAVNet SAVNet NNP 10_1101-436634 121 4 relies rely VBZ 10_1101-436634 121 5 on on IN 10_1101-436634 121 6 splicing splicing NN 10_1101-436634 121 7 junction junction NN 10_1101-436634 121 8 files file NNS 10_1101-436634 121 9 generated generate VBN 10_1101-436634 121 10 by by IN 10_1101-436634 121 11 STAR29 STAR29 NNP 10_1101-436634 121 12 whereas whereas IN 10_1101-436634 121 13 RegTools regtool NNS 10_1101-436634 121 14 can can MD 10_1101-436634 121 15 use use VB 10_1101-436634 121 16 RNA RNA NNP 10_1101-436634 121 17 - - HYPH 10_1101-436634 121 18 Seq seq NN 10_1101-436634 121 19 alignment alignment NN 10_1101-436634 121 20 files file NNS 10_1101-436634 121 21 from from IN 10_1101-436634 121 22 HISAT230 HISAT230 NNP 10_1101-436634 121 23 , , , 10_1101-436634 121 24 TopHat231 TopHat231 NNP 10_1101-436634 121 25 , , , 10_1101-436634 121 26 or or CC 10_1101-436634 121 27 STAR STAR NNP 10_1101-436634 121 28 , , , 10_1101-436634 121 29 thus thus RB 10_1101-436634 121 30 allowing allow VBG 10_1101-436634 121 31 it -PRON- PRP 10_1101-436634 121 32 to to TO 10_1101-436634 121 33 be be VB 10_1101-436634 121 34 integrated integrate VBN 10_1101-436634 121 35 into into IN 10_1101-436634 121 36 bioinformatic bioinformatic JJ 10_1101-436634 121 37 workflows workflow NNS 10_1101-436634 121 38 more more RBR 10_1101-436634 121 39 easily easily RB 10_1101-436634 121 40 . . . 10_1101-436634 122 1 In in IN 10_1101-436634 122 2 their -PRON- PRP$ 10_1101-436634 122 3 recent recent JJ 10_1101-436634 122 4 publications publication NNS 10_1101-436634 122 5 , , , 10_1101-436634 122 6 SAVNet23 SAVNet23 NNP 10_1101-436634 122 7 , , , 10_1101-436634 122 8 MiSplice20 MiSplice20 NNP 10_1101-436634 122 9 , , , 10_1101-436634 122 10 and and CC 10_1101-436634 122 11 Veridical21,22 Veridical21,22 NNP 10_1101-436634 122 12 also also RB 10_1101-436634 122 13 analyzed analyze VBD 10_1101-436634 122 14 data datum NNS 10_1101-436634 122 15 from from IN 10_1101-436634 122 16 TCGA TCGA NNP 10_1101-436634 122 17 , , , 10_1101-436634 122 18 with with IN 10_1101-436634 122 19 only only JJ 10_1101-436634 122 20 minor minor JJ 10_1101-436634 122 21 differences difference NNS 10_1101-436634 122 22 in in IN 10_1101-436634 122 23 the the DT 10_1101-436634 122 24 number number NN 10_1101-436634 122 25 of of IN 10_1101-436634 122 26 samples sample NNS 10_1101-436634 122 27 included include VBN 10_1101-436634 122 28 for for IN 10_1101-436634 122 29 each each DT 10_1101-436634 122 30 study study NN 10_1101-436634 122 31 . . . 10_1101-436634 123 1 VEP VEP NNP 10_1101-436634 123 2 and and CC 10_1101-436634 123 3 SpliceAI spliceai JJ 10_1101-436634 123 4 results result NNS 10_1101-436634 123 5 were be VBD 10_1101-436634 123 6 obtained obtain VBN 10_1101-436634 123 7 by by IN 10_1101-436634 123 8 running run VBG 10_1101-436634 123 9 each each DT 10_1101-436634 123 10 tool tool NN 10_1101-436634 123 11 on on IN 10_1101-436634 123 12 all all DT 10_1101-436634 123 13 starting start VBG 10_1101-436634 123 14 variants variant NNS 10_1101-436634 123 15 for for IN 10_1101-436634 123 16 the the DT 10_1101-436634 123 17 35 35 CD 10_1101-436634 123 18 cohorts cohort NNS 10_1101-436634 123 19 included include VBN 10_1101-436634 123 20 in in IN 10_1101-436634 123 21 this this DT 10_1101-436634 123 22 study study NN 10_1101-436634 123 23 . . . 10_1101-436634 124 1 In in IN 10_1101-436634 124 2 order order NN 10_1101-436634 124 3 to to TO 10_1101-436634 124 4 efficiently efficiently RB 10_1101-436634 124 5 compare compare VB 10_1101-436634 124 6 this this DT 10_1101-436634 124 7 data data NN 10_1101-436634 124 8 , , , 10_1101-436634 124 9 an an DT 10_1101-436634 124 10 UpSet UpSet NNP 10_1101-436634 124 11 plot plot NN 10_1101-436634 124 12 ( ( -LRB- 10_1101-436634 124 13 Figure figure NN 10_1101-436634 124 14 4B 4b NN 10_1101-436634 124 15 ) ) -RRB- 10_1101-436634 124 16 was be VBD 10_1101-436634 124 17 created32 created32 VBN 10_1101-436634 124 18 . . . 10_1101-436634 125 1 Only only RB 10_1101-436634 125 2 343 343 CD 10_1101-436634 125 3 variants variant NNS 10_1101-436634 125 4 are be VBP 10_1101-436634 125 5 identified identify VBN 10_1101-436634 125 6 as as IN 10_1101-436634 125 7 splice splice NN 10_1101-436634 125 8 altering alter VBG 10_1101-436634 125 9 by by IN 10_1101-436634 125 10 all all DT 10_1101-436634 125 11 six six CD 10_1101-436634 125 12 tools tool NNS 10_1101-436634 125 13 . . . 10_1101-436634 126 1 Comparatively comparatively RB 10_1101-436634 126 2 , , , 10_1101-436634 126 3 MiSplice MiSplice NNP 10_1101-436634 126 4 and and CC 10_1101-436634 126 5 SAVNet SAVNet NNP 10_1101-436634 126 6 find find VBP 10_1101-436634 126 7 few few JJ 10_1101-436634 126 8 splice splice NN 10_1101-436634 126 9 altering alter VBG 10_1101-436634 126 10 variants variant NNS 10_1101-436634 126 11 , , , 10_1101-436634 126 12 potentially potentially RB 10_1101-436634 126 13 indicating indicate VBG 10_1101-436634 126 14 that that IN 10_1101-436634 126 15 these these DT 10_1101-436634 126 16 tools tool NNS 10_1101-436634 126 17 are be VBP 10_1101-436634 126 18 overlooking overlook VBG 10_1101-436634 126 19 the the DT 10_1101-436634 126 20 complete complete JJ 10_1101-436634 126 21 set set NN 10_1101-436634 126 22 of of IN 10_1101-436634 126 23 variants variant NNS 10_1101-436634 126 24 that that WDT 10_1101-436634 126 25 have have VBP 10_1101-436634 126 26 an an DT 10_1101-436634 126 27 effect effect NN 10_1101-436634 126 28 on on IN 10_1101-436634 126 29 splicing splicing NN 10_1101-436634 126 30 . . . 10_1101-436634 127 1 In in IN 10_1101-436634 127 2 contrast contrast NN 10_1101-436634 127 3 , , , 10_1101-436634 127 4 Veridical veridical JJ 10_1101-436634 127 5 identifies identifie NNS 10_1101-436634 127 6 by by IN 10_1101-436634 127 7 far far RB 10_1101-436634 127 8 the the DT 10_1101-436634 127 9 most most JJS 10_1101-436634 127 10 splice splice NN 10_1101-436634 127 11 altering alter VBG 10_1101-436634 127 12 variants variant NNS 10_1101-436634 127 13 across across IN 10_1101-436634 127 14 all all DT 10_1101-436634 127 15 tools tool NNS 10_1101-436634 127 16 , , , 10_1101-436634 127 17 with with IN 10_1101-436634 127 18 94.54 94.54 CD 10_1101-436634 127 19 percent percent NN 10_1101-436634 127 20 of of IN 10_1101-436634 127 21 its -PRON- PRP$ 10_1101-436634 127 22 calls call NNS 10_1101-436634 127 23 being be VBG 10_1101-436634 127 24 found find VBN 10_1101-436634 127 25 by by IN 10_1101-436634 127 26 it -PRON- PRP 10_1101-436634 127 27 alone alone RB 10_1101-436634 127 28 . . . 10_1101-436634 128 1 SpliceAI SpliceAI VBN 10_1101-436634 128 2 and and CC 10_1101-436634 128 3 VEP VEP NNP 10_1101-436634 128 4 called call VBD 10_1101-436634 128 5 a a DT 10_1101-436634 128 6 large large JJ 10_1101-436634 128 7 number number NN 10_1101-436634 128 8 of of IN 10_1101-436634 128 9 variants variant NNS 10_1101-436634 128 10 , , , 10_1101-436634 128 11 either either CC 10_1101-436634 128 12 alone alone RB 10_1101-436634 128 13 or or CC 10_1101-436634 128 14 in in IN 10_1101-436634 128 15 agreement agreement NN 10_1101-436634 128 16 , , , 10_1101-436634 128 17 that that IN 10_1101-436634 128 18 none none NN 10_1101-436634 128 19 of of IN 10_1101-436634 128 20 the the DT 10_1101-436634 128 21 tools tool NNS 10_1101-436634 128 22 that that WDT 10_1101-436634 128 23 integrate integrate VBP 10_1101-436634 128 24 transcriptomic transcriptomic JJ 10_1101-436634 128 25 data datum NNS 10_1101-436634 128 26 from from IN 10_1101-436634 128 27 samples sample NNS 10_1101-436634 128 28 identify identify VBP 10_1101-436634 128 29 . . . 10_1101-436634 129 1 This this DT 10_1101-436634 129 2 highlights highlight VBZ 10_1101-436634 129 3 a a DT 10_1101-436634 129 4 limitation limitation NN 10_1101-436634 129 5 of of IN 10_1101-436634 129 6 using use VBG 10_1101-436634 129 7 tools tool NNS 10_1101-436634 129 8 that that WDT 10_1101-436634 129 9 only only RB 10_1101-436634 129 10 focus focus VBP 10_1101-436634 129 11 on on IN 10_1101-436634 129 12 genomic genomic JJ 10_1101-436634 129 13 data datum NNS 10_1101-436634 129 14 , , , 10_1101-436634 129 15 particularly particularly RB 10_1101-436634 129 16 in in IN 10_1101-436634 129 17 a a DT 10_1101-436634 129 18 disease disease NN 10_1101-436634 129 19 context context NN 10_1101-436634 129 20 where where WRB 10_1101-436634 129 21 transcripts transcript NNS 10_1101-436634 129 22 are be VBP 10_1101-436634 129 23 unlikely unlikely JJ 10_1101-436634 129 24 to to TO 10_1101-436634 129 25 have have VB 10_1101-436634 129 26 been be VBN 10_1101-436634 129 27 annotated annotate VBN 10_1101-436634 129 28 before before RB 10_1101-436634 129 29 . . . 10_1101-436634 130 1 RegTools RegTools NNP 10_1101-436634 130 2 addresses address VBZ 10_1101-436634 130 3 these these DT 10_1101-436634 130 4 short short JJ 10_1101-436634 130 5 - - HYPH 10_1101-436634 130 6 comings coming NNS 10_1101-436634 130 7 by by IN 10_1101-436634 130 8 identifying identify VBG 10_1101-436634 130 9 what what WP 10_1101-436634 130 10 pieces piece NNS 10_1101-436634 130 11 of of IN 10_1101-436634 130 12 information information NN 10_1101-436634 130 13 to to TO 10_1101-436634 130 14 extract extract VB 10_1101-436634 130 15 from from IN 10_1101-436634 130 16 a a DT 10_1101-436634 130 17 sample sample NN 10_1101-436634 130 18 ’s ’s , 10_1101-436634 130 19 genome genome NN 10_1101-436634 130 20 and and CC 10_1101-436634 130 21 transcriptome transcriptome VBP 10_1101-436634 130 22 in in IN 10_1101-436634 130 23 a a DT 10_1101-436634 130 24 very very RB 10_1101-436634 130 25 basic basic JJ 10_1101-436634 130 26 , , , 10_1101-436634 130 27 unbiased unbiased JJ 10_1101-436634 130 28 way way NN 10_1101-436634 130 29 that that WDT 10_1101-436634 130 30 allows allow VBZ 10_1101-436634 130 31 for for IN 10_1101-436634 130 32 generalization generalization NN 10_1101-436634 130 33 . . . 10_1101-436634 131 1 Other other JJ 10_1101-436634 131 2 .CC .CC NFP 10_1101-436634 131 3 - - HYPH 10_1101-436634 131 4 BY by IN 10_1101-436634 131 5 - - HYPH 10_1101-436634 131 6 NC NC NNP 10_1101-436634 131 7 - - HYPH 10_1101-436634 131 8 ND ND NNP 10_1101-436634 131 9 4.0 4.0 CD 10_1101-436634 131 10 International international JJ 10_1101-436634 131 11 licensea licensea NNS 10_1101-436634 131 12 certified certify VBN 10_1101-436634 131 13 by by IN 10_1101-436634 131 14 peer peer NN 10_1101-436634 131 15 review review NN 10_1101-436634 131 16 ) ) -RRB- 10_1101-436634 131 17 is be VBZ 10_1101-436634 131 18 the the DT 10_1101-436634 131 19 author author NN 10_1101-436634 131 20 / / SYM 10_1101-436634 131 21 funder funder NN 10_1101-436634 131 22 , , , 10_1101-436634 131 23 who who WP 10_1101-436634 131 24 has have VBZ 10_1101-436634 131 25 granted grant VBN 10_1101-436634 131 26 bioRxiv biorxiv IN 10_1101-436634 131 27 a a DT 10_1101-436634 131 28 license license NN 10_1101-436634 131 29 to to TO 10_1101-436634 131 30 display display VB 10_1101-436634 131 31 the the DT 10_1101-436634 131 32 preprint preprint NN 10_1101-436634 131 33 in in IN 10_1101-436634 131 34 perpetuity perpetuity NN 10_1101-436634 131 35 . . . 10_1101-436634 132 1 It -PRON- PRP 10_1101-436634 132 2 is be VBZ 10_1101-436634 132 3 made make VBN 10_1101-436634 132 4 available available JJ 10_1101-436634 132 5 under under IN 10_1101-436634 132 6 The the DT 10_1101-436634 132 7 copyright copyright NN 10_1101-436634 132 8 holder holder NN 10_1101-436634 132 9 for for IN 10_1101-436634 132 10 this this DT 10_1101-436634 132 11 preprint preprint NN 10_1101-436634 132 12 ( ( -LRB- 10_1101-436634 132 13 which which WDT 10_1101-436634 132 14 was be VBD 10_1101-436634 132 15 notthis notthis DT 10_1101-436634 132 16 version version NN 10_1101-436634 132 17 posted post VBN 10_1101-436634 132 18 January January NNP 10_1101-436634 132 19 5 5 CD 10_1101-436634 132 20 , , , 10_1101-436634 132 21 2021 2021 CD 10_1101-436634 132 22 . . . 10_1101-436634 132 23 ; ; : 10_1101-436634 132 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 132 25 : : : 10_1101-436634 132 26 bioRxiv biorxiv VB 10_1101-436634 132 27 preprint preprint NN 10_1101-436634 132 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 132 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 132 30 7 7 CD 10_1101-436634 132 31 tools tool NNS 10_1101-436634 132 32 either either CC 10_1101-436634 132 33 only only RB 10_1101-436634 132 34 analyze analyze VB 10_1101-436634 132 35 genomic genomic JJ 10_1101-436634 132 36 data datum NNS 10_1101-436634 132 37 , , , 10_1101-436634 132 38 focus focus VBP 10_1101-436634 132 39 on on IN 10_1101-436634 132 40 junctions junction NNS 10_1101-436634 132 41 where where WRB 10_1101-436634 132 42 either either CC 10_1101-436634 132 43 the the DT 10_1101-436634 132 44 canonical canonical JJ 10_1101-436634 132 45 donor donor NN 10_1101-436634 132 46 or or CC 10_1101-436634 132 47 acceptor acceptor NN 10_1101-436634 132 48 site site NN 10_1101-436634 132 49 is be VBZ 10_1101-436634 132 50 affected affect VBN 10_1101-436634 132 51 ( ( -LRB- 10_1101-436634 132 52 missing miss VBG 10_1101-436634 132 53 junctions junction NNS 10_1101-436634 132 54 that that WDT 10_1101-436634 132 55 result result VBP 10_1101-436634 132 56 from from IN 10_1101-436634 132 57 complete complete JJ 10_1101-436634 132 58 exon exon JJ 10_1101-436634 132 59 skipping skipping NN 10_1101-436634 132 60 ) ) -RRB- 10_1101-436634 132 61 , , , 10_1101-436634 132 62 or or CC 10_1101-436634 132 63 consider consider VB 10_1101-436634 132 64 only only RB 10_1101-436634 132 65 those those DT 10_1101-436634 132 66 variants variant NNS 10_1101-436634 132 67 within within IN 10_1101-436634 132 68 a a DT 10_1101-436634 132 69 very very RB 10_1101-436634 132 70 narrow narrow JJ 10_1101-436634 132 71 distance distance NN 10_1101-436634 132 72 from from IN 10_1101-436634 132 73 known know VBN 10_1101-436634 132 74 splice splice NN 10_1101-436634 132 75 sites site NNS 10_1101-436634 132 76 . . . 10_1101-436634 133 1 RegTools regtool NNS 10_1101-436634 133 2 can can MD 10_1101-436634 133 3 include include VB 10_1101-436634 133 4 any any DT 10_1101-436634 133 5 kind kind NN 10_1101-436634 133 6 of of IN 10_1101-436634 133 7 junction junction NN 10_1101-436634 133 8 type type NN 10_1101-436634 133 9 , , , 10_1101-436634 133 10 including include VBG 10_1101-436634 133 11 exon exon NN 10_1101-436634 133 12 - - HYPH 10_1101-436634 133 13 exon exon NN 10_1101-436634 133 14 junctions junction NNS 10_1101-436634 133 15 that that WDT 10_1101-436634 133 16 have have VBP 10_1101-436634 133 17 ends end NNS 10_1101-436634 133 18 that that WDT 10_1101-436634 133 19 are be VBP 10_1101-436634 133 20 not not RB 10_1101-436634 133 21 known know VBN 10_1101-436634 133 22 donor donor NN 10_1101-436634 133 23 / / SYM 10_1101-436634 133 24 acceptor acceptor NN 10_1101-436634 133 25 sites site NNS 10_1101-436634 133 26 according accord VBG 10_1101-436634 133 27 to to IN 10_1101-436634 133 28 the the DT 10_1101-436634 133 29 GTF GTF NNP 10_1101-436634 133 30 file file NN 10_1101-436634 133 31 ( ( -LRB- 10_1101-436634 133 32 N n IN 10_1101-436634 133 33 junction junction NN 10_1101-436634 133 34 according accord VBG 10_1101-436634 133 35 to to IN 10_1101-436634 133 36 RegTools RegTools NNP 10_1101-436634 133 37 ) ) -RRB- 10_1101-436634 133 38 , , , 10_1101-436634 133 39 any any DT 10_1101-436634 133 40 distance distance NN 10_1101-436634 133 41 size size NN 10_1101-436634 133 42 to to TO 10_1101-436634 133 43 make make VB 10_1101-436634 133 44 variant variant JJ 10_1101-436634 133 45 - - HYPH 10_1101-436634 133 46 junction junction NN 10_1101-436634 133 47 associations association NNS 10_1101-436634 133 48 , , , 10_1101-436634 133 49 and and CC 10_1101-436634 133 50 any any DT 10_1101-436634 133 51 window window NN 10_1101-436634 133 52 size size NN 10_1101-436634 133 53 in in IN 10_1101-436634 133 54 which which WDT 10_1101-436634 133 55 to to TO 10_1101-436634 133 56 consider consider VB 10_1101-436634 133 57 variants variant NNS 10_1101-436634 133 58 . . . 10_1101-436634 134 1 Due due IN 10_1101-436634 134 2 to to IN 10_1101-436634 134 3 these these DT 10_1101-436634 134 4 advantages advantage NNS 10_1101-436634 134 5 , , , 10_1101-436634 134 6 RegTools RegTools NNP 10_1101-436634 134 7 identified identify VBN 10_1101-436634 134 8 events event NNS 10_1101-436634 134 9 missed miss VBN 10_1101-436634 134 10 by by IN 10_1101-436634 134 11 one one CD 10_1101-436634 134 12 or or CC 10_1101-436634 134 13 multiple multiple NN 10_1101-436634 134 14 of of IN 10_1101-436634 134 15 the the DT 10_1101-436634 134 16 tools tool NNS 10_1101-436634 134 17 to to TO 10_1101-436634 134 18 which which WDT 10_1101-436634 134 19 we -PRON- PRP 10_1101-436634 134 20 compared compare VBD 10_1101-436634 134 21 ( ( -LRB- 10_1101-436634 134 22 Figure figure NN 10_1101-436634 134 23 4B 4b NN 10_1101-436634 134 24 ; ; : 10_1101-436634 134 25 Supplementary Supplementary NNP 10_1101-436634 134 26 Figures Figures NNPS 10_1101-436634 134 27 4 4 CD 10_1101-436634 134 28 and and CC 10_1101-436634 134 29 5 5 CD 10_1101-436634 134 30 ) ) -RRB- 10_1101-436634 134 31 . . . 10_1101-436634 135 1 Pan pan JJ 10_1101-436634 135 2 - - JJ 10_1101-436634 135 3 cancer cancer JJ 10_1101-436634 135 4 analysis analysis NN 10_1101-436634 135 5 reveals reveal VBZ 10_1101-436634 135 6 novel novel JJ 10_1101-436634 135 7 splicing splicing NN 10_1101-436634 135 8 patterns pattern NNS 10_1101-436634 135 9 within within IN 10_1101-436634 135 10 known know VBN 10_1101-436634 135 11 cancer cancer NN 10_1101-436634 135 12 genes gene NNS 10_1101-436634 135 13 and and CC 10_1101-436634 135 14 potential potential JJ 10_1101-436634 135 15 cancer cancer NN 10_1101-436634 135 16 drivers driver NNS 10_1101-436634 135 17 While while IN 10_1101-436634 135 18 efforts effort NNS 10_1101-436634 135 19 have have VBP 10_1101-436634 135 20 been be VBN 10_1101-436634 135 21 made make VBN 10_1101-436634 135 22 to to TO 10_1101-436634 135 23 associate associate JJ 10_1101-436634 135 24 variants variant NNS 10_1101-436634 135 25 with with IN 10_1101-436634 135 26 specific specific JJ 10_1101-436634 135 27 cancer cancer NN 10_1101-436634 135 28 types type NNS 10_1101-436634 135 29 , , , 10_1101-436634 135 30 there there EX 10_1101-436634 135 31 has have VBZ 10_1101-436634 135 32 been be VBN 10_1101-436634 135 33 little little JJ 10_1101-436634 135 34 focus focus NN 10_1101-436634 135 35 on on IN 10_1101-436634 135 36 identifying identify VBG 10_1101-436634 135 37 such such JJ 10_1101-436634 135 38 associations association NNS 10_1101-436634 135 39 in in IN 10_1101-436634 135 40 splice splice NN 10_1101-436634 135 41 - - HYPH 10_1101-436634 135 42 altering alter VBG 10_1101-436634 135 43 variants variant NNS 10_1101-436634 135 44 , , , 10_1101-436634 135 45 even even RB 10_1101-436634 135 46 those those DT 10_1101-436634 135 47 in in IN 10_1101-436634 135 48 known know VBN 10_1101-436634 135 49 cancer cancer NN 10_1101-436634 135 50 genes gene NNS 10_1101-436634 135 51 . . . 10_1101-436634 136 1 TP53 tp53 NN 10_1101-436634 136 2 is be VBZ 10_1101-436634 136 3 a a DT 10_1101-436634 136 4 rare rare JJ 10_1101-436634 136 5 example example NN 10_1101-436634 136 6 whose whose WP$ 10_1101-436634 136 7 splice splice NN 10_1101-436634 136 8 - - HYPH 10_1101-436634 136 9 altering alter VBG 10_1101-436634 136 10 variants variant NNS 10_1101-436634 136 11 are be VBP 10_1101-436634 136 12 well well RB 10_1101-436634 136 13 characterized characterize VBN 10_1101-436634 136 14 in in IN 10_1101-436634 136 15 numerous numerous JJ 10_1101-436634 136 16 cancer cancer NN 10_1101-436634 136 17 types33 types33 IN 10_1101-436634 136 18 . . . 10_1101-436634 137 1 As as IN 10_1101-436634 137 2 such such JJ 10_1101-436634 137 3 , , , 10_1101-436634 137 4 we -PRON- PRP 10_1101-436634 137 5 further further RB 10_1101-436634 137 6 analyzed analyze VBD 10_1101-436634 137 7 significant significant JJ 10_1101-436634 137 8 events event NNS 10_1101-436634 137 9 to to TO 10_1101-436634 137 10 identify identify VB 10_1101-436634 137 11 genes gene NNS 10_1101-436634 137 12 that that WDT 10_1101-436634 137 13 had have VBD 10_1101-436634 137 14 recurrent recurrent JJ 10_1101-436634 137 15 splice splice NN 10_1101-436634 137 16 altering alter VBG 10_1101-436634 137 17 variants variant NNS 10_1101-436634 137 18 . . . 10_1101-436634 138 1 Within within IN 10_1101-436634 138 2 each each DT 10_1101-436634 138 3 cohort cohort NN 10_1101-436634 138 4 , , , 10_1101-436634 138 5 we -PRON- PRP 10_1101-436634 138 6 looked look VBD 10_1101-436634 138 7 for for IN 10_1101-436634 138 8 recurrent recurrent JJ 10_1101-436634 138 9 genes gene NNS 10_1101-436634 138 10 using use VBG 10_1101-436634 138 11 two two CD 10_1101-436634 138 12 separate separate JJ 10_1101-436634 138 13 metrics metric NNS 10_1101-436634 138 14 : : : 10_1101-436634 138 15 a a DT 10_1101-436634 138 16 binomial binomial JJ 10_1101-436634 138 17 test test NN 10_1101-436634 138 18 p p NN 10_1101-436634 138 19 - - HYPH 10_1101-436634 138 20 value value NN 10_1101-436634 138 21 and and CC 10_1101-436634 138 22 the the DT 10_1101-436634 138 23 fraction fraction NN 10_1101-436634 138 24 of of IN 10_1101-436634 138 25 samples sample NNS 10_1101-436634 138 26 ( ( -LRB- 10_1101-436634 138 27 see see VB 10_1101-436634 138 28 Methods method NNS 10_1101-436634 138 29 ) ) -RRB- 10_1101-436634 138 30 . . . 10_1101-436634 139 1 For for IN 10_1101-436634 139 2 ranking rank VBG 10_1101-436634 139 3 and and CC 10_1101-436634 139 4 selecting select VBG 10_1101-436634 139 5 the the DT 10_1101-436634 139 6 most most RBS 10_1101-436634 139 7 recurrent recurrent JJ 10_1101-436634 139 8 genes gene NNS 10_1101-436634 139 9 , , , 10_1101-436634 139 10 each each DT 10_1101-436634 139 11 metric metric JJ 10_1101-436634 139 12 was be VBD 10_1101-436634 139 13 computed compute VBN 10_1101-436634 139 14 by by IN 10_1101-436634 139 15 pooling pool VBG 10_1101-436634 139 16 across across IN 10_1101-436634 139 17 all all DT 10_1101-436634 139 18 cohorts cohort NNS 10_1101-436634 139 19 . . . 10_1101-436634 140 1 For for IN 10_1101-436634 140 2 assessing assess VBG 10_1101-436634 140 3 cancer cancer NN 10_1101-436634 140 4 - - HYPH 10_1101-436634 140 5 type type NN 10_1101-436634 140 6 specificity specificity NN 10_1101-436634 140 7 , , , 10_1101-436634 140 8 each each DT 10_1101-436634 140 9 metric metric NN 10_1101-436634 140 10 was be VBD 10_1101-436634 140 11 then then RB 10_1101-436634 140 12 also also RB 10_1101-436634 140 13 computed compute VBN 10_1101-436634 140 14 using use VBG 10_1101-436634 140 15 only only RB 10_1101-436634 140 16 results result NNS 10_1101-436634 140 17 from from IN 10_1101-436634 140 18 a a DT 10_1101-436634 140 19 given give VBN 10_1101-436634 140 20 cancer cancer NN 10_1101-436634 140 21 cohort cohort NN 10_1101-436634 140 22 . . . 10_1101-436634 141 1 Since since IN 10_1101-436634 141 2 the the DT 10_1101-436634 141 3 mechanisms mechanism NNS 10_1101-436634 141 4 underlying underlie VBG 10_1101-436634 141 5 the the DT 10_1101-436634 141 6 creation creation NN 10_1101-436634 141 7 of of IN 10_1101-436634 141 8 novel novel JJ 10_1101-436634 141 9 junctions junction NNS 10_1101-436634 141 10 versus versus IN 10_1101-436634 141 11 the the DT 10_1101-436634 141 12 disruption disruption NN 10_1101-436634 141 13 of of IN 10_1101-436634 141 14 existing exist VBG 10_1101-436634 141 15 splicing splicing NN 10_1101-436634 141 16 patterns pattern NNS 10_1101-436634 141 17 may may MD 10_1101-436634 141 18 be be VB 10_1101-436634 141 19 different different JJ 10_1101-436634 141 20 , , , 10_1101-436634 141 21 analysis analysis NN 10_1101-436634 141 22 was be VBD 10_1101-436634 141 23 performed perform VBN 10_1101-436634 141 24 separately separately RB 10_1101-436634 141 25 for for IN 10_1101-436634 141 26 D d NN 10_1101-436634 141 27 / / SYM 10_1101-436634 141 28 A A NNP 10_1101-436634 141 29 / / SYM 10_1101-436634 141 30 NDA NDA NNP 10_1101-436634 141 31 junctions junction NNS 10_1101-436634 141 32 ( ( -LRB- 10_1101-436634 141 33 Figure Figure NNP 10_1101-436634 141 34 5 5 CD 10_1101-436634 141 35 , , , 10_1101-436634 141 36 Supplementary Supplementary NNP 10_1101-436634 141 37 Figure Figure NNP 10_1101-436634 141 38 6 6 CD 10_1101-436634 141 39 , , , 10_1101-436634 141 40 Supplementary Supplementary NNP 10_1101-436634 141 41 File File NNP 10_1101-436634 141 42 5 5 CD 10_1101-436634 141 43 ) ) -RRB- 10_1101-436634 141 44 and and CC 10_1101-436634 141 45 DA DA NNP 10_1101-436634 141 46 junctions junction NNS 10_1101-436634 141 47 ( ( -LRB- 10_1101-436634 141 48 Supplementary Supplementary NNP 10_1101-436634 141 49 Figure Figure NNP 10_1101-436634 141 50 7 7 CD 10_1101-436634 141 51 , , , 10_1101-436634 141 52 Supplementary Supplementary NNP 10_1101-436634 141 53 File File NNP 10_1101-436634 141 54 6 6 CD 10_1101-436634 141 55 ) ) -RRB- 10_1101-436634 141 56 , , , 10_1101-436634 141 57 which which WDT 10_1101-436634 141 58 allowed allow VBD 10_1101-436634 141 59 multiple multiple JJ 10_1101-436634 141 60 test test NN 10_1101-436634 141 61 correction correction NN 10_1101-436634 141 62 in in IN 10_1101-436634 141 63 accordance accordance NN 10_1101-436634 141 64 with with IN 10_1101-436634 141 65 the the DT 10_1101-436634 141 66 noise noise NN 10_1101-436634 141 67 of of IN 10_1101-436634 141 68 the the DT 10_1101-436634 141 69 respective respective JJ 10_1101-436634 141 70 data datum NNS 10_1101-436634 141 71 . . . 10_1101-436634 142 1 We -PRON- PRP 10_1101-436634 142 2 identified identify VBD 10_1101-436634 142 3 6,954 6,954 CD 10_1101-436634 142 4 genes gene NNS 10_1101-436634 142 5 in in IN 10_1101-436634 142 6 which which WDT 10_1101-436634 142 7 there there EX 10_1101-436634 142 8 was be VBD 10_1101-436634 142 9 least least RBS 10_1101-436634 142 10 one one CD 10_1101-436634 142 11 variant variant NN 10_1101-436634 142 12 predicted predict VBN 10_1101-436634 142 13 to to TO 10_1101-436634 142 14 influence influence VB 10_1101-436634 142 15 the the DT 10_1101-436634 142 16 splicing splicing NN 10_1101-436634 142 17 of of IN 10_1101-436634 142 18 a a DT 10_1101-436634 142 19 D d NN 10_1101-436634 142 20 / / SYM 10_1101-436634 142 21 A A NNP 10_1101-436634 142 22 / / SYM 10_1101-436634 142 23 NDA NDA NNP 10_1101-436634 142 24 junction junction NN 10_1101-436634 142 25 . . . 10_1101-436634 143 1 The the DT 10_1101-436634 143 2 99th 99th JJ 10_1101-436634 143 3 percentile percentile NN 10_1101-436634 143 4 of of IN 10_1101-436634 143 5 these these DT 10_1101-436634 143 6 genes gene NNS 10_1101-436634 143 7 , , , 10_1101-436634 143 8 when when WRB 10_1101-436634 143 9 ranked rank VBN 10_1101-436634 143 10 by by IN 10_1101-436634 143 11 either either DT 10_1101-436634 143 12 metric metric NN 10_1101-436634 143 13 , , , 10_1101-436634 143 14 are be VBP 10_1101-436634 143 15 significantly significantly RB 10_1101-436634 143 16 enriched enrich VBN 10_1101-436634 143 17 for for IN 10_1101-436634 143 18 known know VBN 10_1101-436634 143 19 cancer cancer NN 10_1101-436634 143 20 genes gene NNS 10_1101-436634 143 21 , , , 10_1101-436634 143 22 as as IN 10_1101-436634 143 23 annotated annotate VBN 10_1101-436634 143 24 by by IN 10_1101-436634 143 25 the the DT 10_1101-436634 143 26 CGC CGC NNP 10_1101-436634 143 27 ( ( -LRB- 10_1101-436634 143 28 p=1.26E-19 p=1.26E-19 NNP 10_1101-436634 143 29 , , , 10_1101-436634 143 30 ranked rank VBN 10_1101-436634 143 31 by by IN 10_1101-436634 143 32 binomial binomial JJ 10_1101-436634 143 33 p p NN 10_1101-436634 143 34 - - HYPH 10_1101-436634 143 35 values value NNS 10_1101-436634 143 36 , , , 10_1101-436634 143 37 p=2.97E-24 p=2.97E-24 NNP 10_1101-436634 143 38 , , , 10_1101-436634 143 39 ranked rank VBN 10_1101-436634 143 40 by by IN 10_1101-436634 143 41 fraction fraction NN 10_1101-436634 143 42 of of IN 10_1101-436634 143 43 samples sample NNS 10_1101-436634 143 44 ; ; : 10_1101-436634 143 45 hypergeometric hypergeometric NN 10_1101-436634 143 46 test test NN 10_1101-436634 143 47 ) ) -RRB- 10_1101-436634 143 48 . . . 10_1101-436634 144 1 We -PRON- PRP 10_1101-436634 144 2 also also RB 10_1101-436634 144 3 identified identify VBD 10_1101-436634 144 4 3,643 3,643 CD 10_1101-436634 144 5 genes gene NNS 10_1101-436634 144 6 in in IN 10_1101-436634 144 7 which which WDT 10_1101-436634 144 8 there there EX 10_1101-436634 144 9 was be VBD 10_1101-436634 144 10 least least RBS 10_1101-436634 144 11 one one CD 10_1101-436634 144 12 variant variant NN 10_1101-436634 144 13 predicted predict VBN 10_1101-436634 144 14 to to TO 10_1101-436634 144 15 influence influence VB 10_1101-436634 144 16 the the DT 10_1101-436634 144 17 splicing splicing NN 10_1101-436634 144 18 of of IN 10_1101-436634 144 19 a a DT 10_1101-436634 144 20 DA DA NNP 10_1101-436634 144 21 ( ( -LRB- 10_1101-436634 144 22 known know VBN 10_1101-436634 144 23 ) ) -RRB- 10_1101-436634 144 24 junction junction NN 10_1101-436634 144 25 . . . 10_1101-436634 145 1 The the DT 10_1101-436634 145 2 99th 99th JJ 10_1101-436634 145 3 percentile percentile NN 10_1101-436634 145 4 of of IN 10_1101-436634 145 5 these these DT 10_1101-436634 145 6 genes gene NNS 10_1101-436634 145 7 , , , 10_1101-436634 145 8 when when WRB 10_1101-436634 145 9 ranked rank VBN 10_1101-436634 145 10 by by IN 10_1101-436634 145 11 either either DT 10_1101-436634 145 12 metric metric NN 10_1101-436634 145 13 , , , 10_1101-436634 145 14 are be VBP 10_1101-436634 145 15 also also RB 10_1101-436634 145 16 significantly significantly RB 10_1101-436634 145 17 enriched enrich VBN 10_1101-436634 145 18 for for IN 10_1101-436634 145 19 known know VBN 10_1101-436634 145 20 cancer cancer NN 10_1101-436634 145 21 genes gene NNS 10_1101-436634 145 22 , , , 10_1101-436634 145 23 as as IN 10_1101-436634 145 24 annotated annotate VBN 10_1101-436634 145 25 by by IN 10_1101-436634 145 26 the the DT 10_1101-436634 145 27 Cancer Cancer NNP 10_1101-436634 145 28 Gene Gene NNP 10_1101-436634 145 29 Census Census NNP 10_1101-436634 145 30 ( ( -LRB- 10_1101-436634 145 31 p=1.00E-04 p=1.00E-04 NNP 10_1101-436634 145 32 , , , 10_1101-436634 145 33 ranked rank VBN 10_1101-436634 145 34 by by IN 10_1101-436634 145 35 binomial binomial JJ 10_1101-436634 145 36 p p NN 10_1101-436634 145 37 - - HYPH 10_1101-436634 145 38 values value NNS 10_1101-436634 145 39 , , , 10_1101-436634 145 40 p=3.56E-07 p=3.56E-07 NNP 10_1101-436634 145 41 , , , 10_1101-436634 145 42 ranked rank VBN 10_1101-436634 145 43 by by IN 10_1101-436634 145 44 fraction fraction NN 10_1101-436634 145 45 of of IN 10_1101-436634 145 46 samples sample NNS 10_1101-436634 145 47 ; ; : 10_1101-436634 145 48 hypergeometric hypergeometric NN 10_1101-436634 145 49 test test NN 10_1101-436634 145 50 ) ) -RRB- 10_1101-436634 145 51 . . . 10_1101-436634 146 1 We -PRON- PRP 10_1101-436634 146 2 also also RB 10_1101-436634 146 3 performed perform VBD 10_1101-436634 146 4 the the DT 10_1101-436634 146 5 same same JJ 10_1101-436634 146 6 analyses analysis NNS 10_1101-436634 146 7 using use VBG 10_1101-436634 146 8 either either CC 10_1101-436634 146 9 the the DT 10_1101-436634 146 10 TCGA TCGA NNP 10_1101-436634 146 11 or or CC 10_1101-436634 146 12 MGI MGI NNP 10_1101-436634 146 13 cohorts cohort NNS 10_1101-436634 146 14 alone alone RB 10_1101-436634 146 15 . . . 10_1101-436634 147 1 The the DT 10_1101-436634 147 2 TCGA tcga NN 10_1101-436634 147 3 - - HYPH 10_1101-436634 147 4 only only RB 10_1101-436634 147 5 analyses analysis NNS 10_1101-436634 147 6 gave give VBD 10_1101-436634 147 7 very very RB 10_1101-436634 147 8 similar similar JJ 10_1101-436634 147 9 results result NNS 10_1101-436634 147 10 to to IN 10_1101-436634 147 11 the the DT 10_1101-436634 147 12 combined combine VBN 10_1101-436634 147 13 analyses analysis NNS 10_1101-436634 147 14 , , , 10_1101-436634 147 15 with with IN 10_1101-436634 147 16 the the DT 10_1101-436634 147 17 99th 99th JJ 10_1101-436634 147 18 percentile percentile NN 10_1101-436634 147 19 of of IN 10_1101-436634 147 20 genes gene NNS 10_1101-436634 147 21 found find VBN 10_1101-436634 147 22 in in IN 10_1101-436634 147 23 the the DT 10_1101-436634 147 24 D d NN 10_1101-436634 147 25 / / SYM 10_1101-436634 147 26 A A NNP 10_1101-436634 147 27 / / SYM 10_1101-436634 147 28 NDA NDA NNP 10_1101-436634 147 29 and and CC 10_1101-436634 147 30 DA DA NNP 10_1101-436634 147 31 analyses analyse VBZ 10_1101-436634 147 32 again again RB 10_1101-436634 147 33 being be VBG 10_1101-436634 147 34 enriched enrich VBN 10_1101-436634 147 35 for for IN 10_1101-436634 147 36 cancer cancer NN 10_1101-436634 147 37 genes gene NNS 10_1101-436634 147 38 ( ( -LRB- 10_1101-436634 147 39 Supplementary Supplementary NNP 10_1101-436634 147 40 Figures Figures NNPS 10_1101-436634 147 41 8 8 CD 10_1101-436634 147 42 and and CC 10_1101-436634 147 43 9 9 CD 10_1101-436634 147 44 ; ; : 10_1101-436634 147 45 Supplemental Supplemental NNP 10_1101-436634 147 46 Files Files NNP 10_1101-436634 147 47 5 5 CD 10_1101-436634 147 48 and and CC 10_1101-436634 147 49 6 6 CD 10_1101-436634 147 50 ) ) -RRB- 10_1101-436634 147 51 . . . 10_1101-436634 148 1 Due due IN 10_1101-436634 148 2 to to IN 10_1101-436634 148 3 small small JJ 10_1101-436634 148 4 cohort cohort NN 10_1101-436634 148 5 sizes size NNS 10_1101-436634 148 6 , , , 10_1101-436634 148 7 in in IN 10_1101-436634 148 8 the the DT 10_1101-436634 148 9 MGI MGI NNP 10_1101-436634 148 10 - - HYPH 10_1101-436634 148 11 only only RB 10_1101-436634 148 12 analyses analysis NNS 10_1101-436634 148 13 , , , 10_1101-436634 148 14 we -PRON- PRP 10_1101-436634 148 15 identified identify VBD 10_1101-436634 148 16 only only RB 10_1101-436634 148 17 329 329 CD 10_1101-436634 148 18 and and CC 10_1101-436634 148 19 208 208 CD 10_1101-436634 148 20 genes gene NNS 10_1101-436634 148 21 in in IN 10_1101-436634 148 22 the the DT 10_1101-436634 148 23 D d NN 10_1101-436634 148 24 / / SYM 10_1101-436634 148 25 A A NNP 10_1101-436634 148 26 / / SYM 10_1101-436634 148 27 NDA NDA NNP 10_1101-436634 148 28 and and CC 10_1101-436634 148 29 DA DA NNP 10_1101-436634 148 30 analyses analysis NNS 10_1101-436634 148 31 , , , 10_1101-436634 148 32 respectively respectively RB 10_1101-436634 148 33 . . . 10_1101-436634 149 1 The the DT 10_1101-436634 149 2 99th 99th JJ 10_1101-436634 149 3 percentile percentile NN 10_1101-436634 149 4 of of IN 10_1101-436634 149 5 genes gene NNS 10_1101-436634 149 6 from from IN 10_1101-436634 149 7 these these DT 10_1101-436634 149 8 analyses analysis NNS 10_1101-436634 149 9 , , , 10_1101-436634 149 10 respectively respectively RB 10_1101-436634 149 11 , , , 10_1101-436634 149 12 were be VBD 10_1101-436634 149 13 not not RB 10_1101-436634 149 14 significantly significantly RB 10_1101-436634 149 15 enriched enrich VBN 10_1101-436634 149 16 for for IN 10_1101-436634 149 17 cancer cancer NN 10_1101-436634 149 18 genes gene NNS 10_1101-436634 149 19 ( ( -LRB- 10_1101-436634 149 20 Supplementary Supplementary NNP 10_1101-436634 149 21 Figures Figures NNPS 10_1101-436634 149 22 10 10 CD 10_1101-436634 149 23 and and CC 10_1101-436634 149 24 11 11 CD 10_1101-436634 149 25 ; ; : 10_1101-436634 149 26 Supplemental Supplemental NNP 10_1101-436634 149 27 Files Files NNP 10_1101-436634 149 28 5 5 CD 10_1101-436634 149 29 and and CC 10_1101-436634 149 30 6 6 CD 10_1101-436634 149 31 ) ) -RRB- 10_1101-436634 149 32 . . . 10_1101-436634 150 1 When when WRB 10_1101-436634 150 2 analyzing analyze VBG 10_1101-436634 150 3 D d NN 10_1101-436634 150 4 , , , 10_1101-436634 150 5 A a NN 10_1101-436634 150 6 , , , 10_1101-436634 150 7 and and CC 10_1101-436634 150 8 NDA NDA NNP 10_1101-436634 150 9 junctions junction NNS 10_1101-436634 150 10 , , , 10_1101-436634 150 11 we -PRON- PRP 10_1101-436634 150 12 saw see VBD 10_1101-436634 150 13 an an DT 10_1101-436634 150 14 enrichment enrichment NN 10_1101-436634 150 15 for for IN 10_1101-436634 150 16 known know VBN 10_1101-436634 150 17 tumor tumor NN 10_1101-436634 150 18 suppressor suppressor NN 10_1101-436634 150 19 genes gene NNS 10_1101-436634 150 20 among among IN 10_1101-436634 150 21 the the DT 10_1101-436634 150 22 most most JJS 10_1101-436634 150 23 splice splice NN 10_1101-436634 150 24 disrupted disrupt VBN 10_1101-436634 150 25 genes gene NNS 10_1101-436634 150 26 , , , 10_1101-436634 150 27 including include VBG 10_1101-436634 150 28 several several JJ 10_1101-436634 150 29 examples example NNS 10_1101-436634 150 30 where where WRB 10_1101-436634 150 31 splice splice NN 10_1101-436634 150 32 .CC .CC , 10_1101-436634 150 33 - - HYPH 10_1101-436634 150 34 BY by IN 10_1101-436634 150 35 - - HYPH 10_1101-436634 150 36 NC NC NNP 10_1101-436634 150 37 - - HYPH 10_1101-436634 150 38 ND ND NNP 10_1101-436634 150 39 4.0 4.0 CD 10_1101-436634 150 40 International international JJ 10_1101-436634 150 41 licensea licensea NNS 10_1101-436634 150 42 certified certify VBN 10_1101-436634 150 43 by by IN 10_1101-436634 150 44 peer peer NN 10_1101-436634 150 45 review review NN 10_1101-436634 150 46 ) ) -RRB- 10_1101-436634 150 47 is be VBZ 10_1101-436634 150 48 the the DT 10_1101-436634 150 49 author author NN 10_1101-436634 150 50 / / SYM 10_1101-436634 150 51 funder funder NN 10_1101-436634 150 52 , , , 10_1101-436634 150 53 who who WP 10_1101-436634 150 54 has have VBZ 10_1101-436634 150 55 granted grant VBN 10_1101-436634 150 56 bioRxiv biorxiv IN 10_1101-436634 150 57 a a DT 10_1101-436634 150 58 license license NN 10_1101-436634 150 59 to to TO 10_1101-436634 150 60 display display VB 10_1101-436634 150 61 the the DT 10_1101-436634 150 62 preprint preprint NN 10_1101-436634 150 63 in in IN 10_1101-436634 150 64 perpetuity perpetuity NN 10_1101-436634 150 65 . . . 10_1101-436634 151 1 It -PRON- PRP 10_1101-436634 151 2 is be VBZ 10_1101-436634 151 3 made make VBN 10_1101-436634 151 4 available available JJ 10_1101-436634 151 5 under under IN 10_1101-436634 151 6 The the DT 10_1101-436634 151 7 copyright copyright NN 10_1101-436634 151 8 holder holder NN 10_1101-436634 151 9 for for IN 10_1101-436634 151 10 this this DT 10_1101-436634 151 11 preprint preprint NN 10_1101-436634 151 12 ( ( -LRB- 10_1101-436634 151 13 which which WDT 10_1101-436634 151 14 was be VBD 10_1101-436634 151 15 notthis notthis DT 10_1101-436634 151 16 version version NN 10_1101-436634 151 17 posted post VBN 10_1101-436634 151 18 January January NNP 10_1101-436634 151 19 5 5 CD 10_1101-436634 151 20 , , , 10_1101-436634 151 21 2021 2021 CD 10_1101-436634 151 22 . . . 10_1101-436634 151 23 ; ; : 10_1101-436634 151 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 151 25 : : : 10_1101-436634 151 26 bioRxiv biorxiv VB 10_1101-436634 151 27 preprint preprint NN 10_1101-436634 151 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 151 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 151 30 8 8 CD 10_1101-436634 151 31 disruption disruption NN 10_1101-436634 151 32 is be VBZ 10_1101-436634 151 33 a a DT 10_1101-436634 151 34 known know VBN 10_1101-436634 151 35 mechanism mechanism NN 10_1101-436634 151 36 such such JJ 10_1101-436634 151 37 as as IN 10_1101-436634 151 38 TP53 TP53 NNP 10_1101-436634 151 39 , , , 10_1101-436634 151 40 PTEN PTEN NNP 10_1101-436634 151 41 , , , 10_1101-436634 151 42 CDKN2A CDKN2A NNP 10_1101-436634 151 43 , , , 10_1101-436634 151 44 and and CC 10_1101-436634 151 45 RB1 RB1 NNP 10_1101-436634 151 46 . . . 10_1101-436634 152 1 Specifically specifically RB 10_1101-436634 152 2 , , , 10_1101-436634 152 3 in in IN 10_1101-436634 152 4 the the DT 10_1101-436634 152 5 case case NN 10_1101-436634 152 6 of of IN 10_1101-436634 152 7 TP53 TP53 NNP 10_1101-436634 152 8 , , , 10_1101-436634 152 9 we -PRON- PRP 10_1101-436634 152 10 identified identify VBD 10_1101-436634 152 11 428 428 CD 10_1101-436634 152 12 variants variant NNS 10_1101-436634 152 13 that that WDT 10_1101-436634 152 14 were be VBD 10_1101-436634 152 15 significantly significantly RB 10_1101-436634 152 16 associated associate VBN 10_1101-436634 152 17 with with IN 10_1101-436634 152 18 at at RB 10_1101-436634 152 19 least least RBS 10_1101-436634 152 20 one one CD 10_1101-436634 152 21 novel novel NN 10_1101-436634 152 22 splicing splicing NN 10_1101-436634 152 23 event event NN 10_1101-436634 152 24 . . . 10_1101-436634 153 1 One one CD 10_1101-436634 153 2 such such JJ 10_1101-436634 153 3 example example NN 10_1101-436634 153 4 is be VBZ 10_1101-436634 153 5 the the DT 10_1101-436634 153 6 intronic intronic JJ 10_1101-436634 153 7 SNV SNV NNP 10_1101-436634 153 8 ( ( -LRB- 10_1101-436634 153 9 GRCh38 GRCh38 NNP 10_1101-436634 153 10 , , , 10_1101-436634 153 11 chr17 chr17 NN 10_1101-436634 153 12 : : : 10_1101-436634 153 13 g.7673609C g.7673609c XX 10_1101-436634 153 14 > > XX 10_1101-436634 153 15 A a NN 10_1101-436634 153 16 ) ) -RRB- 10_1101-436634 153 17 that that WDT 10_1101-436634 153 18 was be VBD 10_1101-436634 153 19 identified identify VBN 10_1101-436634 153 20 in in IN 10_1101-436634 153 21 an an DT 10_1101-436634 153 22 OSCC OSCC NNP 10_1101-436634 153 23 sample sample NN 10_1101-436634 153 24 and and CC 10_1101-436634 153 25 was be VBD 10_1101-436634 153 26 associated associate VBN 10_1101-436634 153 27 with with IN 10_1101-436634 153 28 an an DT 10_1101-436634 153 29 exon exon JJ 10_1101-436634 153 30 skipping skipping NN 10_1101-436634 153 31 event event NN 10_1101-436634 153 32 and and CC 10_1101-436634 153 33 an an DT 10_1101-436634 153 34 alternate alternate JJ 10_1101-436634 153 35 acceptor acceptor NN 10_1101-436634 153 36 site site NN 10_1101-436634 153 37 usage usage NN 10_1101-436634 153 38 event event NN 10_1101-436634 153 39 , , , 10_1101-436634 153 40 with with IN 10_1101-436634 153 41 23 23 CD 10_1101-436634 153 42 and and CC 10_1101-436634 153 43 41 41 CD 10_1101-436634 153 44 reads read NNS 10_1101-436634 153 45 of of IN 10_1101-436634 153 46 support support NN 10_1101-436634 153 47 , , , 10_1101-436634 153 48 respectively respectively RB 10_1101-436634 153 49 ( ( -LRB- 10_1101-436634 153 50 Supplemental Supplemental NNP 10_1101-436634 153 51 Figure Figure NNP 10_1101-436634 153 52 12 12 CD 10_1101-436634 153 53 ) ) -RRB- 10_1101-436634 153 54 . . . 10_1101-436634 154 1 The the DT 10_1101-436634 154 2 cancer cancer NN 10_1101-436634 154 3 types type NNS 10_1101-436634 154 4 in in IN 10_1101-436634 154 5 which which WDT 10_1101-436634 154 6 we -PRON- PRP 10_1101-436634 154 7 find find VBP 10_1101-436634 154 8 splice splice NN 10_1101-436634 154 9 disruption disruption NN 10_1101-436634 154 10 of of IN 10_1101-436634 154 11 TP53 TP53 NNP 10_1101-436634 154 12 and and CC 10_1101-436634 154 13 other other JJ 10_1101-436634 154 14 known know VBN 10_1101-436634 154 15 cancer cancer NN 10_1101-436634 154 16 genes gene NNS 10_1101-436634 154 17 is be VBZ 10_1101-436634 154 18 in in IN 10_1101-436634 154 19 concordance concordance NN 10_1101-436634 154 20 with with IN 10_1101-436634 154 21 associations association NNS 10_1101-436634 154 22 between between IN 10_1101-436634 154 23 genes gene NNS 10_1101-436634 154 24 and and CC 10_1101-436634 154 25 cancer cancer NN 10_1101-436634 154 26 types type NNS 10_1101-436634 154 27 described describe VBN 10_1101-436634 154 28 by by IN 10_1101-436634 154 29 CGC CGC NNP 10_1101-436634 154 30 and and CC 10_1101-436634 154 31 CHASMplus27,34 CHASMplus27,34 NNP 10_1101-436634 154 32 . . . 10_1101-436634 155 1 Our -PRON- PRP$ 10_1101-436634 155 2 analysis analysis NN 10_1101-436634 155 3 ’s ’s POS 10_1101-436634 155 4 recovery recovery NN 10_1101-436634 155 5 of of IN 10_1101-436634 155 6 known know VBN 10_1101-436634 155 7 drivers driver NNS 10_1101-436634 155 8 , , , 10_1101-436634 155 9 many many JJ 10_1101-436634 155 10 of of IN 10_1101-436634 155 11 which which WDT 10_1101-436634 155 12 with with IN 10_1101-436634 155 13 known know VBN 10_1101-436634 155 14 susceptibilities susceptibility NNS 10_1101-436634 155 15 to to IN 10_1101-436634 155 16 splicing splice VBG 10_1101-436634 155 17 dysregulation dysregulation NN 10_1101-436634 155 18 in in IN 10_1101-436634 155 19 cancer cancer NN 10_1101-436634 155 20 , , , 10_1101-436634 155 21 indicates indicate VBZ 10_1101-436634 155 22 the the DT 10_1101-436634 155 23 ability ability NN 10_1101-436634 155 24 of of IN 10_1101-436634 155 25 our -PRON- PRP$ 10_1101-436634 155 26 method method NN 10_1101-436634 155 27 to to TO 10_1101-436634 155 28 identify identify VB 10_1101-436634 155 29 true true JJ 10_1101-436634 155 30 splicing splicing NN 10_1101-436634 155 31 effects effect NNS 10_1101-436634 155 32 that that WDT 10_1101-436634 155 33 are be VBP 10_1101-436634 155 34 likely likely RB 10_1101-436634 155 35 cancer cancer NN 10_1101-436634 155 36 - - HYPH 10_1101-436634 155 37 relevant relevant JJ 10_1101-436634 155 38 . . . 10_1101-436634 156 1 Another another DT 10_1101-436634 156 2 cancer cancer NN 10_1101-436634 156 3 gene gene NN 10_1101-436634 156 4 that that WDT 10_1101-436634 156 5 we -PRON- PRP 10_1101-436634 156 6 found find VBD 10_1101-436634 156 7 to to TO 10_1101-436634 156 8 have have VB 10_1101-436634 156 9 a a DT 10_1101-436634 156 10 recurrence recurrence NN 10_1101-436634 156 11 of of IN 10_1101-436634 156 12 splicing splice VBG 10_1101-436634 156 13 altering alter VBG 10_1101-436634 156 14 variants variant NNS 10_1101-436634 156 15 was be VBD 10_1101-436634 156 16 B2M. B2M. NNP 10_1101-436634 157 1 Specifically specifically RB 10_1101-436634 157 2 , , , 10_1101-436634 157 3 we -PRON- PRP 10_1101-436634 157 4 identified identify VBD 10_1101-436634 157 5 six six CD 10_1101-436634 157 6 samples sample NNS 10_1101-436634 157 7 with with IN 10_1101-436634 157 8 intronic intronic JJ 10_1101-436634 157 9 variants variant NNS 10_1101-436634 157 10 on on IN 10_1101-436634 157 11 either either DT 10_1101-436634 157 12 side side NN 10_1101-436634 157 13 of of IN 10_1101-436634 157 14 exon exon JJ 10_1101-436634 157 15 2 2 CD 10_1101-436634 157 16 ( ( -LRB- 10_1101-436634 157 17 Figure Figure NNP 10_1101-436634 157 18 6 6 CD 10_1101-436634 157 19 ) ) -RRB- 10_1101-436634 157 20 . . . 10_1101-436634 158 1 While while IN 10_1101-436634 158 2 mutations mutation NNS 10_1101-436634 158 3 have have VBP 10_1101-436634 158 4 been be VBN 10_1101-436634 158 5 identified identify VBN 10_1101-436634 158 6 and and CC 10_1101-436634 158 7 studied study VBN 10_1101-436634 158 8 within within IN 10_1101-436634 158 9 exon exon NNP 10_1101-436634 158 10 2 2 CD 10_1101-436634 158 11 , , , 10_1101-436634 158 12 we -PRON- PRP 10_1101-436634 158 13 did do VBD 10_1101-436634 158 14 not not RB 10_1101-436634 158 15 find find VB 10_1101-436634 158 16 literature literature NN 10_1101-436634 158 17 that that WDT 10_1101-436634 158 18 specifically specifically RB 10_1101-436634 158 19 identified identify VBN 10_1101-436634 158 20 intronic intronic JJ 10_1101-436634 158 21 variants variant NNS 10_1101-436634 158 22 near near IN 10_1101-436634 158 23 exon exon NNP 10_1101-436634 158 24 2 2 CD 10_1101-436634 158 25 as as IN 10_1101-436634 158 26 a a DT 10_1101-436634 158 27 mechanism mechanism NN 10_1101-436634 158 28 for for IN 10_1101-436634 158 29 disrupting disrupt VBG 10_1101-436634 158 30 B2M35 b2m35 NN 10_1101-436634 158 31 . . . 10_1101-436634 159 1 These these DT 10_1101-436634 159 2 mutations mutation NNS 10_1101-436634 159 3 were be VBD 10_1101-436634 159 4 identified identify VBN 10_1101-436634 159 5 by by IN 10_1101-436634 159 6 VEP VEP NNP 10_1101-436634 159 7 to to TO 10_1101-436634 159 8 be be VB 10_1101-436634 159 9 either either CC 10_1101-436634 159 10 splice splice NN 10_1101-436634 159 11 acceptor acceptor NN 10_1101-436634 159 12 variant variant NN 10_1101-436634 159 13 or or CC 10_1101-436634 159 14 a a DT 10_1101-436634 159 15 splice splice NN 10_1101-436634 159 16 donor donor NN 10_1101-436634 159 17 variant variant JJ 10_1101-436634 159 18 and and CC 10_1101-436634 159 19 were be VBD 10_1101-436634 159 20 also also RB 10_1101-436634 159 21 identified identify VBN 10_1101-436634 159 22 by by IN 10_1101-436634 159 23 Veridical Veridical NNP 10_1101-436634 159 24 . . . 10_1101-436634 160 1 MiSplice MiSplice NNP 10_1101-436634 160 2 was be VBD 10_1101-436634 160 3 able able JJ 10_1101-436634 160 4 to to TO 10_1101-436634 160 5 predict predict VB 10_1101-436634 160 6 one one CD 10_1101-436634 160 7 of of IN 10_1101-436634 160 8 the the DT 10_1101-436634 160 9 novel novel JJ 10_1101-436634 160 10 junctions junction NNS 10_1101-436634 160 11 for for IN 10_1101-436634 160 12 each each DT 10_1101-436634 160 13 variant variant JJ 10_1101-436634 160 14 but but CC 10_1101-436634 160 15 failed fail VBD 10_1101-436634 160 16 to to TO 10_1101-436634 160 17 predict predict VB 10_1101-436634 160 18 additional additional JJ 10_1101-436634 160 19 novel novel JJ 10_1101-436634 160 20 junctions junction NNS 10_1101-436634 160 21 due due JJ 10_1101-436634 160 22 to to IN 10_1101-436634 160 23 the the DT 10_1101-436634 160 24 limitation limitation NN 10_1101-436634 160 25 of of IN 10_1101-436634 160 26 that that DT 10_1101-436634 160 27 tool tool NN 10_1101-436634 160 28 to to TO 10_1101-436634 160 29 only only RB 10_1101-436634 160 30 predict predict VB 10_1101-436634 160 31 one one CD 10_1101-436634 160 32 novel novel JJ 10_1101-436634 160 33 acceptor acceptor NN 10_1101-436634 160 34 and and CC 10_1101-436634 160 35 donor donor NN 10_1101-436634 160 36 site site NN 10_1101-436634 160 37 per per IN 10_1101-436634 160 38 variant variant NN 10_1101-436634 160 39 . . . 10_1101-436634 161 1 Notably notably RB 10_1101-436634 161 2 , , , 10_1101-436634 161 3 4 4 CD 10_1101-436634 161 4 out out IN 10_1101-436634 161 5 of of IN 10_1101-436634 161 6 the the DT 10_1101-436634 161 7 6 6 CD 10_1101-436634 161 8 samples sample NNS 10_1101-436634 161 9 that that WDT 10_1101-436634 161 10 these these DT 10_1101-436634 161 11 variants variant NNS 10_1101-436634 161 12 were be VBD 10_1101-436634 161 13 found find VBN 10_1101-436634 161 14 in in IN 10_1101-436634 161 15 are be VBP 10_1101-436634 161 16 MSI MSI NNP 10_1101-436634 161 17 - - HYPH 10_1101-436634 161 18 H H NNP 10_1101-436634 161 19 ( ( -LRB- 10_1101-436634 161 20 Microsatellite Microsatellite NNP 10_1101-436634 161 21 instability instability NN 10_1101-436634 161 22 - - HYPH 10_1101-436634 161 23 high high NN 10_1101-436634 161 24 ) ) -RRB- 10_1101-436634 161 25 tumors36 tumors36 NN 10_1101-436634 161 26 . . . 10_1101-436634 162 1 Mutations mutation NNS 10_1101-436634 162 2 in in IN 10_1101-436634 162 3 B2 B2 NNP 10_1101-436634 162 4 M M NNP 10_1101-436634 162 5 , , , 10_1101-436634 162 6 particularly particularly RB 10_1101-436634 162 7 within within IN 10_1101-436634 162 8 colorectal colorectal JJ 10_1101-436634 162 9 MSI MSI NNP 10_1101-436634 162 10 - - HYPH 10_1101-436634 162 11 H h NN 10_1101-436634 162 12 tumors tumor NNS 10_1101-436634 162 13 , , , 10_1101-436634 162 14 have have VBP 10_1101-436634 162 15 been be VBN 10_1101-436634 162 16 identified identify VBN 10_1101-436634 162 17 as as IN 10_1101-436634 162 18 a a DT 10_1101-436634 162 19 method method NN 10_1101-436634 162 20 for for IN 10_1101-436634 162 21 tumors tumor NNS 10_1101-436634 162 22 to to TO 10_1101-436634 162 23 become become VB 10_1101-436634 162 24 incapable incapable JJ 10_1101-436634 162 25 of of IN 10_1101-436634 162 26 HLA HLA NNP 10_1101-436634 162 27 class class NN 10_1101-436634 162 28 I -PRON- PRP 10_1101-436634 162 29 antigen antigen NN 10_1101-436634 162 30 - - HYPH 10_1101-436634 162 31 mediated mediate VBN 10_1101-436634 162 32 presentation37 presentation37 NNS 10_1101-436634 162 33 . . . 10_1101-436634 163 1 Furthermore furthermore RB 10_1101-436634 163 2 , , , 10_1101-436634 163 3 in in IN 10_1101-436634 163 4 a a DT 10_1101-436634 163 5 study study NN 10_1101-436634 163 6 of of IN 10_1101-436634 163 7 patients patient NNS 10_1101-436634 163 8 treated treat VBN 10_1101-436634 163 9 with with IN 10_1101-436634 163 10 immune immune JJ 10_1101-436634 163 11 checkpoint checkpoint NN 10_1101-436634 163 12 blockade blockade NN 10_1101-436634 163 13 ( ( -LRB- 10_1101-436634 163 14 ICB ICB NNP 10_1101-436634 163 15 ) ) -RRB- 10_1101-436634 163 16 therapy therapy NN 10_1101-436634 163 17 , , , 10_1101-436634 163 18 defects defect NNS 10_1101-436634 163 19 to to IN 10_1101-436634 163 20 B2 B2 NNP 10_1101-436634 163 21 M M NNP 10_1101-436634 163 22 were be VBD 10_1101-436634 163 23 observed observe VBN 10_1101-436634 163 24 in in IN 10_1101-436634 163 25 29.4 29.4 CD 10_1101-436634 163 26 % % NN 10_1101-436634 163 27 of of IN 10_1101-436634 163 28 patients patient NNS 10_1101-436634 163 29 with with IN 10_1101-436634 163 30 progressing progress VBG 10_1101-436634 163 31 disease38 disease38 NNP 10_1101-436634 163 32 . . . 10_1101-436634 164 1 In in IN 10_1101-436634 164 2 the the DT 10_1101-436634 164 3 same same JJ 10_1101-436634 164 4 study study NN 10_1101-436634 164 5 , , , 10_1101-436634 164 6 B2 B2 NNP 10_1101-436634 164 7 M M NNP 10_1101-436634 164 8 mutations mutation NNS 10_1101-436634 164 9 were be VBD 10_1101-436634 164 10 exclusively exclusively RB 10_1101-436634 164 11 seen see VBN 10_1101-436634 164 12 in in IN 10_1101-436634 164 13 pretreatment pretreatment JJ 10_1101-436634 164 14 samples sample NNS 10_1101-436634 164 15 from from IN 10_1101-436634 164 16 patients patient NNS 10_1101-436634 164 17 who who WP 10_1101-436634 164 18 did do VBD 10_1101-436634 164 19 not not RB 10_1101-436634 164 20 respond respond VB 10_1101-436634 164 21 to to IN 10_1101-436634 164 22 ICB ICB NNP 10_1101-436634 164 23 or or CC 10_1101-436634 164 24 in in IN 10_1101-436634 164 25 post- post- NNP 10_1101-436634 164 26 progression progression NN 10_1101-436634 164 27 samples sample NNS 10_1101-436634 164 28 after after IN 10_1101-436634 164 29 initial initial JJ 10_1101-436634 164 30 response response NN 10_1101-436634 164 31 to to IN 10_1101-436634 164 32 ICB38 ICB38 NNP 10_1101-436634 164 33 . . . 10_1101-436634 165 1 There there EX 10_1101-436634 165 2 are be VBP 10_1101-436634 165 3 several several JJ 10_1101-436634 165 4 genes gene NNS 10_1101-436634 165 5 that that WDT 10_1101-436634 165 6 are be VBP 10_1101-436634 165 7 responsible responsible JJ 10_1101-436634 165 8 for for IN 10_1101-436634 165 9 the the DT 10_1101-436634 165 10 processing processing NN 10_1101-436634 165 11 , , , 10_1101-436634 165 12 loading loading NN 10_1101-436634 165 13 , , , 10_1101-436634 165 14 and and CC 10_1101-436634 165 15 presentation presentation NN 10_1101-436634 165 16 of of IN 10_1101-436634 165 17 antigens antigen NNS 10_1101-436634 165 18 , , , 10_1101-436634 165 19 and and CC 10_1101-436634 165 20 have have VBP 10_1101-436634 165 21 been be VBN 10_1101-436634 165 22 shown show VBN 10_1101-436634 165 23 to to TO 10_1101-436634 165 24 be be VB 10_1101-436634 165 25 mutated mutate VBN 10_1101-436634 165 26 in in IN 10_1101-436634 165 27 cancers39 cancers39 NNP 10_1101-436634 165 28 . . . 10_1101-436634 166 1 However however RB 10_1101-436634 166 2 , , , 10_1101-436634 166 3 no no DT 10_1101-436634 166 4 proteins protein NNS 10_1101-436634 166 5 can can MD 10_1101-436634 166 6 be be VB 10_1101-436634 166 7 substituted substitute VBN 10_1101-436634 166 8 for for IN 10_1101-436634 166 9 B2 B2 NNP 10_1101-436634 166 10 M M NNP 10_1101-436634 166 11 in in IN 10_1101-436634 166 12 HLA HLA NNP 10_1101-436634 166 13 class class NN 10_1101-436634 166 14 I -PRON- PRP 10_1101-436634 166 15 presentation presentation NN 10_1101-436634 166 16 , , , 10_1101-436634 166 17 thus thus RB 10_1101-436634 166 18 making make VBG 10_1101-436634 166 19 the the DT 10_1101-436634 166 20 loss loss NN 10_1101-436634 166 21 of of IN 10_1101-436634 166 22 B2 b2 NN 10_1101-436634 166 23 M M NNP 10_1101-436634 166 24 a a DT 10_1101-436634 166 25 particularly particularly RB 10_1101-436634 166 26 robust robust JJ 10_1101-436634 166 27 method method NN 10_1101-436634 166 28 for for IN 10_1101-436634 166 29 ICB ICB NNP 10_1101-436634 166 30 resistance40 resistance40 NNP 10_1101-436634 166 31 . . . 10_1101-436634 167 1 We -PRON- PRP 10_1101-436634 167 2 also also RB 10_1101-436634 167 3 observe observe VBP 10_1101-436634 167 4 exonic exonic JJ 10_1101-436634 167 5 variants variant NNS 10_1101-436634 167 6 and and CC 10_1101-436634 167 7 variants variant NNS 10_1101-436634 167 8 further far RBR 10_1101-436634 167 9 in in IN 10_1101-436634 167 10 intronic intronic JJ 10_1101-436634 167 11 regions region NNS 10_1101-436634 167 12 that that WDT 10_1101-436634 167 13 disrupt disrupt VBP 10_1101-436634 167 14 canonical canonical JJ 10_1101-436634 167 15 splicing splicing NN 10_1101-436634 167 16 of of IN 10_1101-436634 167 17 B2M. B2M. NNP 10_1101-436634 168 1 These these DT 10_1101-436634 168 2 findings finding NNS 10_1101-436634 168 3 indicate indicate VBP 10_1101-436634 168 4 that that IN 10_1101-436634 168 5 intronic intronic JJ 10_1101-436634 168 6 variants variant NNS 10_1101-436634 168 7 that that WDT 10_1101-436634 168 8 result result VBP 10_1101-436634 168 9 in in IN 10_1101-436634 168 10 alternative alternative JJ 10_1101-436634 168 11 splice splice NN 10_1101-436634 168 12 products product NNS 10_1101-436634 168 13 within within IN 10_1101-436634 168 14 B2 b2 NN 10_1101-436634 168 15 M M NNP 10_1101-436634 168 16 may may MD 10_1101-436634 168 17 be be VB 10_1101-436634 168 18 a a DT 10_1101-436634 168 19 mechanism mechanism NN 10_1101-436634 168 20 for for IN 10_1101-436634 168 21 immune immune JJ 10_1101-436634 168 22 escape escape NN 10_1101-436634 168 23 within within IN 10_1101-436634 168 24 tumor tumor NN 10_1101-436634 168 25 samples sample NNS 10_1101-436634 168 26 . . . 10_1101-436634 169 1 We -PRON- PRP 10_1101-436634 169 2 also also RB 10_1101-436634 169 3 identify identify VBP 10_1101-436634 169 4 recurrent recurrent JJ 10_1101-436634 169 5 splice splice NN 10_1101-436634 169 6 altering alter VBG 10_1101-436634 169 7 variants variant NNS 10_1101-436634 169 8 in in IN 10_1101-436634 169 9 genes gene NNS 10_1101-436634 169 10 not not RB 10_1101-436634 169 11 known know VBN 10_1101-436634 169 12 to to TO 10_1101-436634 169 13 be be VB 10_1101-436634 169 14 cancer cancer NN 10_1101-436634 169 15 genes gene NNS 10_1101-436634 169 16 ( ( -LRB- 10_1101-436634 169 17 according accord VBG 10_1101-436634 169 18 to to IN 10_1101-436634 169 19 CGC CGC NNP 10_1101-436634 169 20 ) ) -RRB- 10_1101-436634 169 21 , , , 10_1101-436634 169 22 such such JJ 10_1101-436634 169 23 as as IN 10_1101-436634 169 24 RNF145 RNF145 NNP 10_1101-436634 169 25 . . . 10_1101-436634 170 1 RegTools RegTools NNP 10_1101-436634 170 2 identified identify VBD 10_1101-436634 170 3 a a DT 10_1101-436634 170 4 recurrent recurrent JJ 10_1101-436634 170 5 single single JJ 10_1101-436634 170 6 base base NN 10_1101-436634 170 7 pair pair NN 10_1101-436634 170 8 deletion deletion NN 10_1101-436634 170 9 that that WDT 10_1101-436634 170 10 results result VBZ 10_1101-436634 170 11 in in IN 10_1101-436634 170 12 an an DT 10_1101-436634 170 13 exon exon JJ 10_1101-436634 170 14 skipping skipping NN 10_1101-436634 170 15 event event NN 10_1101-436634 170 16 of of IN 10_1101-436634 170 17 exon exon NNP 10_1101-436634 170 18 8 8 CD 10_1101-436634 170 19 ( ( -LRB- 10_1101-436634 170 20 Supplementary Supplementary NNP 10_1101-436634 170 21 Figure Figure NNP 10_1101-436634 170 22 13 13 CD 10_1101-436634 170 23 ) ) -RRB- 10_1101-436634 170 24 . . . 10_1101-436634 171 1 This this DT 10_1101-436634 171 2 gene gene NN 10_1101-436634 171 3 is be VBZ 10_1101-436634 171 4 a a DT 10_1101-436634 171 5 paralog paralog NN 10_1101-436634 171 6 of of IN 10_1101-436634 171 7 RNF139 RNF139 NNP 10_1101-436634 171 8 , , , 10_1101-436634 171 9 which which WDT 10_1101-436634 171 10 has have VBZ 10_1101-436634 171 11 been be VBN 10_1101-436634 171 12 found find VBN 10_1101-436634 171 13 to to TO 10_1101-436634 171 14 be be VB 10_1101-436634 171 15 mutated mutate VBN 10_1101-436634 171 16 in in IN 10_1101-436634 171 17 several several JJ 10_1101-436634 171 18 cancer cancer NN 10_1101-436634 171 19 types41 types41 NN 10_1101-436634 171 20 . . . 10_1101-436634 172 1 This this DT 10_1101-436634 172 2 variant variant JJ 10_1101-436634 172 3 junction junction NN 10_1101-436634 172 4 association association NN 10_1101-436634 172 5 was be VBD 10_1101-436634 172 6 found find VBN 10_1101-436634 172 7 in in IN 10_1101-436634 172 8 STAD STAD NNP 10_1101-436634 172 9 , , , 10_1101-436634 172 10 UCEC UCEC NNP 10_1101-436634 172 11 , , , 10_1101-436634 172 12 COAD COAD NNP 10_1101-436634 172 13 , , , 10_1101-436634 172 14 and and CC 10_1101-436634 172 15 ESCA esca NN 10_1101-436634 172 16 tumors tumor NNS 10_1101-436634 172 17 , , , 10_1101-436634 172 18 all all DT 10_1101-436634 172 19 of of IN 10_1101-436634 172 20 which which WDT 10_1101-436634 172 21 are be VBP 10_1101-436634 172 22 considered consider VBN 10_1101-436634 172 23 to to TO 10_1101-436634 172 24 be be VB 10_1101-436634 172 25 MSI MSI NNP 10_1101-436634 172 26 - - HYPH 10_1101-436634 172 27 H H NNP 10_1101-436634 172 28 tumors36 tumors36 NN 10_1101-436634 172 29 . . . 10_1101-436634 173 1 After after IN 10_1101-436634 173 2 analyzing analyze VBG 10_1101-436634 173 3 the the DT 10_1101-436634 173 4 effect effect NN 10_1101-436634 173 5 of of IN 10_1101-436634 173 6 the the DT 10_1101-436634 173 7 exon exon JJ 10_1101-436634 173 8 skipping skipping NN 10_1101-436634 173 9 event event NN 10_1101-436634 173 10 on on IN 10_1101-436634 173 11 the the DT 10_1101-436634 173 12 mRNA mrna NN 10_1101-436634 173 13 sequence sequence NN 10_1101-436634 173 14 , , , 10_1101-436634 173 15 we -PRON- PRP 10_1101-436634 173 16 concluded conclude VBD 10_1101-436634 173 17 that that IN 10_1101-436634 173 18 the the DT 10_1101-436634 173 19 reading reading NN 10_1101-436634 173 20 frame frame NN 10_1101-436634 173 21 remains remain VBZ 10_1101-436634 173 22 intact intact JJ 10_1101-436634 173 23 , , , 10_1101-436634 173 24 possibly possibly RB 10_1101-436634 173 25 leading lead VBG 10_1101-436634 173 26 to to IN 10_1101-436634 173 27 a a DT 10_1101-436634 173 28 gain gain NN 10_1101-436634 173 29 of of IN 10_1101-436634 173 30 function function NN 10_1101-436634 173 31 event event NN 10_1101-436634 173 32 . . . 10_1101-436634 174 1 Additionally additionally RB 10_1101-436634 174 2 , , , 10_1101-436634 174 3 the the DT 10_1101-436634 174 4 skipping skipping NN 10_1101-436634 174 5 of of IN 10_1101-436634 174 6 exon exon NNP 10_1101-436634 174 7 8 8 CD 10_1101-436634 174 8 leads lead NNS 10_1101-436634 174 9 to to IN 10_1101-436634 174 10 the the DT 10_1101-436634 174 11 removal removal NN 10_1101-436634 174 12 of of IN 10_1101-436634 174 13 a a DT 10_1101-436634 174 14 transmembrane transmembrane JJ 10_1101-436634 174 15 domain domain NN 10_1101-436634 174 16 and and CC 10_1101-436634 174 17 a a DT 10_1101-436634 174 18 phosphorylation phosphorylation NN 10_1101-436634 174 19 site site NN 10_1101-436634 174 20 , , , 10_1101-436634 174 21 S352 s352 NN 10_1101-436634 174 22 , , , 10_1101-436634 174 23 which which WDT 10_1101-436634 174 24 could could MD 10_1101-436634 174 25 be be VB 10_1101-436634 174 26 important important JJ 10_1101-436634 174 27 for for IN 10_1101-436634 174 28 the the DT 10_1101-436634 174 29 regulation regulation NN 10_1101-436634 174 30 of of IN 10_1101-436634 174 31 this this DT 10_1101-436634 174 32 gene42 gene42 VBN 10_1101-436634 174 33 . . . 10_1101-436634 175 1 Based base VBN 10_1101-436634 175 2 on on IN 10_1101-436634 175 3 these these DT 10_1101-436634 175 4 findings finding NNS 10_1101-436634 175 5 , , , 10_1101-436634 175 6 RNF145 RNF145 NNP 10_1101-436634 175 7 may may MD 10_1101-436634 175 8 play play VB 10_1101-436634 175 9 a a DT 10_1101-436634 175 10 role role NN 10_1101-436634 175 11 similar similar JJ 10_1101-436634 175 12 to to IN 10_1101-436634 175 13 RNF139 RNF139 NNP 10_1101-436634 175 14 and and CC 10_1101-436634 175 15 may may MD 10_1101-436634 175 16 be be VB 10_1101-436634 175 17 an an DT 10_1101-436634 175 18 important important JJ 10_1101-436634 175 19 driver driver NN 10_1101-436634 175 20 event event NN 10_1101-436634 175 21 in in IN 10_1101-436634 175 22 certain certain JJ 10_1101-436634 175 23 tumor tumor NN 10_1101-436634 175 24 samples sample NNS 10_1101-436634 175 25 . . . 10_1101-436634 176 1 .CC .CC NFP 10_1101-436634 176 2 - - : 10_1101-436634 176 3 BY by IN 10_1101-436634 176 4 - - HYPH 10_1101-436634 176 5 NC NC NNP 10_1101-436634 176 6 - - HYPH 10_1101-436634 176 7 ND ND NNP 10_1101-436634 176 8 4.0 4.0 CD 10_1101-436634 176 9 International international JJ 10_1101-436634 176 10 licensea licensea NNS 10_1101-436634 176 11 certified certify VBN 10_1101-436634 176 12 by by IN 10_1101-436634 176 13 peer peer NN 10_1101-436634 176 14 review review NN 10_1101-436634 176 15 ) ) -RRB- 10_1101-436634 176 16 is be VBZ 10_1101-436634 176 17 the the DT 10_1101-436634 176 18 author author NN 10_1101-436634 176 19 / / SYM 10_1101-436634 176 20 funder funder NN 10_1101-436634 176 21 , , , 10_1101-436634 176 22 who who WP 10_1101-436634 176 23 has have VBZ 10_1101-436634 176 24 granted grant VBN 10_1101-436634 176 25 bioRxiv biorxiv IN 10_1101-436634 176 26 a a DT 10_1101-436634 176 27 license license NN 10_1101-436634 176 28 to to TO 10_1101-436634 176 29 display display VB 10_1101-436634 176 30 the the DT 10_1101-436634 176 31 preprint preprint NN 10_1101-436634 176 32 in in IN 10_1101-436634 176 33 perpetuity perpetuity NN 10_1101-436634 176 34 . . . 10_1101-436634 177 1 It -PRON- PRP 10_1101-436634 177 2 is be VBZ 10_1101-436634 177 3 made make VBN 10_1101-436634 177 4 available available JJ 10_1101-436634 177 5 under under IN 10_1101-436634 177 6 The the DT 10_1101-436634 177 7 copyright copyright NN 10_1101-436634 177 8 holder holder NN 10_1101-436634 177 9 for for IN 10_1101-436634 177 10 this this DT 10_1101-436634 177 11 preprint preprint NN 10_1101-436634 177 12 ( ( -LRB- 10_1101-436634 177 13 which which WDT 10_1101-436634 177 14 was be VBD 10_1101-436634 177 15 notthis notthis DT 10_1101-436634 177 16 version version NN 10_1101-436634 177 17 posted post VBN 10_1101-436634 177 18 January January NNP 10_1101-436634 177 19 5 5 CD 10_1101-436634 177 20 , , , 10_1101-436634 177 21 2021 2021 CD 10_1101-436634 177 22 . . . 10_1101-436634 177 23 ; ; : 10_1101-436634 177 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 177 25 : : : 10_1101-436634 177 26 bioRxiv biorxiv VB 10_1101-436634 177 27 preprint preprint NN 10_1101-436634 177 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 177 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 177 30 9 9 CD 10_1101-436634 177 31 While while IN 10_1101-436634 177 32 most most JJS 10_1101-436634 177 33 of of IN 10_1101-436634 177 34 our -PRON- PRP$ 10_1101-436634 177 35 analysis analysis NN 10_1101-436634 177 36 focused focus VBN 10_1101-436634 177 37 on on IN 10_1101-436634 177 38 splice splice NN 10_1101-436634 177 39 altering altering NN 10_1101-436634 177 40 variants variant NNS 10_1101-436634 177 41 that that WDT 10_1101-436634 177 42 resulted result VBD 10_1101-436634 177 43 in in IN 10_1101-436634 177 44 D d NN 10_1101-436634 177 45 , , , 10_1101-436634 177 46 A A NNP 10_1101-436634 177 47 , , , 10_1101-436634 177 48 NDA NDA NNP 10_1101-436634 177 49 junctions junction NNS 10_1101-436634 177 50 , , , 10_1101-436634 177 51 we -PRON- PRP 10_1101-436634 177 52 also also RB 10_1101-436634 177 53 wanted want VBD 10_1101-436634 177 54 to to TO 10_1101-436634 177 55 investigate investigate VB 10_1101-436634 177 56 variants variant NNS 10_1101-436634 177 57 that that WDT 10_1101-436634 177 58 shifted shift VBD 10_1101-436634 177 59 the the DT 10_1101-436634 177 60 usage usage NN 10_1101-436634 177 61 of of IN 10_1101-436634 177 62 known know VBN 10_1101-436634 177 63 donor donor NN 10_1101-436634 177 64 and and CC 10_1101-436634 177 65 acceptor acceptor NN 10_1101-436634 177 66 sites site NNS 10_1101-436634 177 67 . . . 10_1101-436634 178 1 Through through IN 10_1101-436634 178 2 this this DT 10_1101-436634 178 3 analysis analysis NN 10_1101-436634 178 4 , , , 10_1101-436634 178 5 we -PRON- PRP 10_1101-436634 178 6 identified identify VBD 10_1101-436634 178 7 CDKN2A CDKN2A NNP 10_1101-436634 178 8 , , , 10_1101-436634 178 9 a a DT 10_1101-436634 178 10 tumor tumor NN 10_1101-436634 178 11 suppressor suppressor NN 10_1101-436634 178 12 gene gene NN 10_1101-436634 178 13 that that WDT 10_1101-436634 178 14 is be VBZ 10_1101-436634 178 15 frequently frequently RB 10_1101-436634 178 16 mutated mutate VBN 10_1101-436634 178 17 in in IN 10_1101-436634 178 18 numerous numerous JJ 10_1101-436634 178 19 cancers43 cancers43 NNP 10_1101-436634 178 20 , , , 10_1101-436634 178 21 to to TO 10_1101-436634 178 22 have have VB 10_1101-436634 178 23 several several JJ 10_1101-436634 178 24 variants variant NNS 10_1101-436634 178 25 that that WDT 10_1101-436634 178 26 led lead VBD 10_1101-436634 178 27 to to IN 10_1101-436634 178 28 alternate alternate JJ 10_1101-436634 178 29 donor donor NN 10_1101-436634 178 30 usage usage NN 10_1101-436634 178 31 ( ( -LRB- 10_1101-436634 178 32 Supplementary Supplementary NNP 10_1101-436634 178 33 Figure Figure NNP 10_1101-436634 178 34 14 14 CD 10_1101-436634 178 35 ) ) -RRB- 10_1101-436634 178 36 . . . 10_1101-436634 179 1 When when WRB 10_1101-436634 179 2 these these DT 10_1101-436634 179 3 variants variant NNS 10_1101-436634 179 4 are be VBP 10_1101-436634 179 5 present present JJ 10_1101-436634 179 6 , , , 10_1101-436634 179 7 an an DT 10_1101-436634 179 8 alternate alternate RB 10_1101-436634 179 9 known know VBN 10_1101-436634 179 10 donor donor NN 10_1101-436634 179 11 site site NN 10_1101-436634 179 12 is be VBZ 10_1101-436634 179 13 used use VBN 10_1101-436634 179 14 that that WDT 10_1101-436634 179 15 leads lead VBZ 10_1101-436634 179 16 to to IN 10_1101-436634 179 17 the the DT 10_1101-436634 179 18 formation formation NN 10_1101-436634 179 19 of of IN 10_1101-436634 179 20 the the DT 10_1101-436634 179 21 transcript transcript NN 10_1101-436634 179 22 ENST00000579122.1 ENST00000579122.1 VBD 10_1101-436634 179 23 instead instead RB 10_1101-436634 179 24 of of IN 10_1101-436634 179 25 ENST00000304494.9 enst00000304494.9 NN 10_1101-436634 179 26 , , , 10_1101-436634 179 27 the the DT 10_1101-436634 179 28 transcript transcript NN 10_1101-436634 179 29 that that WDT 10_1101-436634 179 30 encodes encode VBZ 10_1101-436634 179 31 for for IN 10_1101-436634 179 32 p16ink4a p16ink4a NNP 10_1101-436634 179 33 , , , 10_1101-436634 179 34 a a DT 10_1101-436634 179 35 known know VBN 10_1101-436634 179 36 tumor tumor NN 10_1101-436634 179 37 suppressor suppressor NN 10_1101-436634 179 38 . . . 10_1101-436634 180 1 The the DT 10_1101-436634 180 2 transcript transcript NN 10_1101-436634 180 3 that that WDT 10_1101-436634 180 4 results result VBZ 10_1101-436634 180 5 from from IN 10_1101-436634 180 6 use use NN 10_1101-436634 180 7 of of IN 10_1101-436634 180 8 this this DT 10_1101-436634 180 9 alternate alternate JJ 10_1101-436634 180 10 donor donor NN 10_1101-436634 180 11 site site NN 10_1101-436634 180 12 is be VBZ 10_1101-436634 180 13 missing miss VBG 10_1101-436634 180 14 the the DT 10_1101-436634 180 15 last last JJ 10_1101-436634 180 16 twenty twenty CD 10_1101-436634 180 17 - - HYPH 10_1101-436634 180 18 eight eight CD 10_1101-436634 180 19 amino amino JJ 10_1101-436634 180 20 acids acid NNS 10_1101-436634 180 21 that that WDT 10_1101-436634 180 22 form form VBP 10_1101-436634 180 23 the the DT 10_1101-436634 180 24 C c NN 10_1101-436634 180 25 - - HYPH 10_1101-436634 180 26 terminal terminal JJ 10_1101-436634 180 27 end end NN 10_1101-436634 180 28 of of IN 10_1101-436634 180 29 p16ink4a p16ink4a RB 10_1101-436634 180 30 . . . 10_1101-436634 181 1 Notably notably RB 10_1101-436634 181 2 , , , 10_1101-436634 181 3 this this DT 10_1101-436634 181 4 removes remove VBZ 10_1101-436634 181 5 two two CD 10_1101-436634 181 6 phosphorylation phosphorylation NN 10_1101-436634 181 7 sites site NNS 10_1101-436634 181 8 within within IN 10_1101-436634 181 9 the the DT 10_1101-436634 181 10 p16 p16 NN 10_1101-436634 181 11 protein protein NN 10_1101-436634 181 12 , , , 10_1101-436634 181 13 S140 S140 NNP 10_1101-436634 181 14 and and CC 10_1101-436634 181 15 S152 S152 NNP 10_1101-436634 181 16 , , , 10_1101-436634 181 17 which which WDT 10_1101-436634 181 18 when when WRB 10_1101-436634 181 19 phosphorylated phosphorylate VBD 10_1101-436634 181 20 promotes promote VBZ 10_1101-436634 181 21 the the DT 10_1101-436634 181 22 association association NN 10_1101-436634 181 23 of of IN 10_1101-436634 181 24 p16ink4a p16ink4a NNP 10_1101-436634 181 25 with with IN 10_1101-436634 181 26 CDK444 CDK444 NNP 10_1101-436634 181 27 . . . 10_1101-436634 182 1 This this DT 10_1101-436634 182 2 finding finding NN 10_1101-436634 182 3 highlights highlight VBZ 10_1101-436634 182 4 the the DT 10_1101-436634 182 5 importance importance NN 10_1101-436634 182 6 of of IN 10_1101-436634 182 7 including include VBG 10_1101-436634 182 8 known know VBN 10_1101-436634 182 9 transcripts transcript NNS 10_1101-436634 182 10 in in IN 10_1101-436634 182 11 alternative alternative JJ 10_1101-436634 182 12 splicing splicing NN 10_1101-436634 182 13 analyses analysis NNS 10_1101-436634 182 14 as as IN 10_1101-436634 182 15 variants variant NNS 10_1101-436634 182 16 may may MD 10_1101-436634 182 17 alter alter VB 10_1101-436634 182 18 splice splice NN 10_1101-436634 182 19 site site NN 10_1101-436634 182 20 usage usage NN 10_1101-436634 182 21 in in IN 10_1101-436634 182 22 a a DT 10_1101-436634 182 23 way way NN 10_1101-436634 182 24 that that WDT 10_1101-436634 182 25 results result VBZ 10_1101-436634 182 26 in in IN 10_1101-436634 182 27 a a DT 10_1101-436634 182 28 known known JJ 10_1101-436634 182 29 but but CC 10_1101-436634 182 30 pathogenic pathogenic JJ 10_1101-436634 182 31 transcript transcript NN 10_1101-436634 182 32 product product NN 10_1101-436634 182 33 . . . 10_1101-436634 183 1 Discussion Discussion NNP 10_1101-436634 183 2 Splice Splice NNP 10_1101-436634 183 3 associated associate VBD 10_1101-436634 183 4 variants variant NNS 10_1101-436634 183 5 are be VBP 10_1101-436634 183 6 often often RB 10_1101-436634 183 7 overlooked overlook VBN 10_1101-436634 183 8 in in IN 10_1101-436634 183 9 traditional traditional JJ 10_1101-436634 183 10 genomic genomic JJ 10_1101-436634 183 11 analysis analysis NN 10_1101-436634 183 12 . . . 10_1101-436634 184 1 To to TO 10_1101-436634 184 2 address address VB 10_1101-436634 184 3 this this DT 10_1101-436634 184 4 limitation limitation NN 10_1101-436634 184 5 , , , 10_1101-436634 184 6 we -PRON- PRP 10_1101-436634 184 7 created create VBD 10_1101-436634 184 8 RegTools RegTools NNP 10_1101-436634 184 9 , , , 10_1101-436634 184 10 a a DT 10_1101-436634 184 11 software software NN 10_1101-436634 184 12 suite suite NN 10_1101-436634 184 13 for for IN 10_1101-436634 184 14 the the DT 10_1101-436634 184 15 analysis analysis NN 10_1101-436634 184 16 of of IN 10_1101-436634 184 17 variants variant NNS 10_1101-436634 184 18 and and CC 10_1101-436634 184 19 junctions junction NNS 10_1101-436634 184 20 in in IN 10_1101-436634 184 21 a a DT 10_1101-436634 184 22 splicing splicing NN 10_1101-436634 184 23 context context NN 10_1101-436634 184 24 . . . 10_1101-436634 185 1 By by IN 10_1101-436634 185 2 relying rely VBG 10_1101-436634 185 3 on on IN 10_1101-436634 185 4 well well RB 10_1101-436634 185 5 - - HYPH 10_1101-436634 185 6 established establish VBN 10_1101-436634 185 7 standards standard NNS 10_1101-436634 185 8 for for IN 10_1101-436634 185 9 analyzing analyze VBG 10_1101-436634 185 10 genomic genomic JJ 10_1101-436634 185 11 and and CC 10_1101-436634 185 12 transcriptomic transcriptomic JJ 10_1101-436634 185 13 data datum NNS 10_1101-436634 185 14 and and CC 10_1101-436634 185 15 allowing allow VBG 10_1101-436634 185 16 flexible flexible JJ 10_1101-436634 185 17 analysis analysis NN 10_1101-436634 185 18 parameters parameter NNS 10_1101-436634 185 19 , , , 10_1101-436634 185 20 we -PRON- PRP 10_1101-436634 185 21 enable enable VBP 10_1101-436634 185 22 users user NNS 10_1101-436634 185 23 to to TO 10_1101-436634 185 24 apply apply VB 10_1101-436634 185 25 RegTools RegTools NNP 10_1101-436634 185 26 to to IN 10_1101-436634 185 27 a a DT 10_1101-436634 185 28 wide wide JJ 10_1101-436634 185 29 set set NN 10_1101-436634 185 30 of of IN 10_1101-436634 185 31 scientific scientific JJ 10_1101-436634 185 32 methodologies methodology NNS 10_1101-436634 185 33 and and CC 10_1101-436634 185 34 datasets dataset NNS 10_1101-436634 185 35 . . . 10_1101-436634 186 1 To to TO 10_1101-436634 186 2 ease ease VB 10_1101-436634 186 3 the the DT 10_1101-436634 186 4 use use NN 10_1101-436634 186 5 and and CC 10_1101-436634 186 6 integration integration NN 10_1101-436634 186 7 of of IN 10_1101-436634 186 8 RegTools RegTools NNP 10_1101-436634 186 9 into into IN 10_1101-436634 186 10 analysis analysis NN 10_1101-436634 186 11 workflows workflow NNS 10_1101-436634 186 12 , , , 10_1101-436634 186 13 we -PRON- PRP 10_1101-436634 186 14 provide provide VBP 10_1101-436634 186 15 documentation documentation NN 10_1101-436634 186 16 and and CC 10_1101-436634 186 17 example example NN 10_1101-436634 186 18 workflows workflow NNS 10_1101-436634 186 19 via via IN 10_1101-436634 186 20 ( ( -LRB- 10_1101-436634 186 21 regtools.org regtools.org CD 10_1101-436634 186 22 ) ) -RRB- 10_1101-436634 186 23 and and CC 10_1101-436634 186 24 provide provide VB 10_1101-436634 186 25 a a DT 10_1101-436634 186 26 Docker Docker NNP 10_1101-436634 186 27 image image NN 10_1101-436634 186 28 with with IN 10_1101-436634 186 29 all all DT 10_1101-436634 186 30 necessary necessary JJ 10_1101-436634 186 31 software software NN 10_1101-436634 186 32 installed instal VBN 10_1101-436634 186 33 . . . 10_1101-436634 187 1 In in IN 10_1101-436634 187 2 order order NN 10_1101-436634 187 3 to to TO 10_1101-436634 187 4 demonstrate demonstrate VB 10_1101-436634 187 5 the the DT 10_1101-436634 187 6 utility utility NN 10_1101-436634 187 7 of of IN 10_1101-436634 187 8 our -PRON- PRP$ 10_1101-436634 187 9 tool tool NN 10_1101-436634 187 10 , , , 10_1101-436634 187 11 we -PRON- PRP 10_1101-436634 187 12 applied apply VBD 10_1101-436634 187 13 RegTools RegTools NNP 10_1101-436634 187 14 to to IN 10_1101-436634 187 15 9,173 9,173 CD 10_1101-436634 187 16 tumor tumor NN 10_1101-436634 187 17 samples sample NNS 10_1101-436634 187 18 across across IN 10_1101-436634 187 19 35 35 CD 10_1101-436634 187 20 tumor tumor NN 10_1101-436634 187 21 types type NNS 10_1101-436634 187 22 to to TO 10_1101-436634 187 23 profile profile VB 10_1101-436634 187 24 the the DT 10_1101-436634 187 25 landscape landscape NN 10_1101-436634 187 26 of of IN 10_1101-436634 187 27 this this DT 10_1101-436634 187 28 category category NN 10_1101-436634 187 29 of of IN 10_1101-436634 187 30 variants variant NNS 10_1101-436634 187 31 . . . 10_1101-436634 188 1 From from IN 10_1101-436634 188 2 this this DT 10_1101-436634 188 3 analysis analysis NN 10_1101-436634 188 4 , , , 10_1101-436634 188 5 we -PRON- PRP 10_1101-436634 188 6 report report VBP 10_1101-436634 188 7 133,987 133,987 CD 10_1101-436634 188 8 variants variant NNS 10_1101-436634 188 9 that that WDT 10_1101-436634 188 10 cause cause VBP 10_1101-436634 188 11 novel novel NN 10_1101-436634 188 12 splicing splicing NN 10_1101-436634 188 13 events event NNS 10_1101-436634 188 14 that that WDT 10_1101-436634 188 15 were be VBD 10_1101-436634 188 16 missed miss VBN 10_1101-436634 188 17 by by IN 10_1101-436634 188 18 VEP VEP NNP 10_1101-436634 188 19 or or CC 10_1101-436634 188 20 SpliceAI SpliceAI NNP 10_1101-436634 188 21 . . . 10_1101-436634 189 1 Only only RB 10_1101-436634 189 2 1.4 1.4 CD 10_1101-436634 189 3 percent percent NN 10_1101-436634 189 4 of of IN 10_1101-436634 189 5 these these DT 10_1101-436634 189 6 mutations mutation NNS 10_1101-436634 189 7 were be VBD 10_1101-436634 189 8 previously previously RB 10_1101-436634 189 9 discovered discover VBN 10_1101-436634 189 10 by by IN 10_1101-436634 189 11 similar similar JJ 10_1101-436634 189 12 attempts attempt NNS 10_1101-436634 189 13 , , , 10_1101-436634 189 14 while while IN 10_1101-436634 189 15 98.6 98.6 CD 10_1101-436634 189 16 percent percent NN 10_1101-436634 189 17 are be VBP 10_1101-436634 189 18 novel novel JJ 10_1101-436634 189 19 findings finding NNS 10_1101-436634 189 20 . . . 10_1101-436634 190 1 We -PRON- PRP 10_1101-436634 190 2 demonstrate demonstrate VBP 10_1101-436634 190 3 that that IN 10_1101-436634 190 4 there there EX 10_1101-436634 190 5 are be VBP 10_1101-436634 190 6 splice splice NN 10_1101-436634 190 7 altering alter VBG 10_1101-436634 190 8 variants variant NNS 10_1101-436634 190 9 that that WDT 10_1101-436634 190 10 occur occur VBP 10_1101-436634 190 11 beyond beyond IN 10_1101-436634 190 12 the the DT 10_1101-436634 190 13 splice splice NN 10_1101-436634 190 14 site site NN 10_1101-436634 190 15 consensus consensus NN 10_1101-436634 190 16 sequence sequence NN 10_1101-436634 190 17 , , , 10_1101-436634 190 18 shift shift NN 10_1101-436634 190 19 transcript transcript NN 10_1101-436634 190 20 usage usage NN 10_1101-436634 190 21 between between IN 10_1101-436634 190 22 known know VBN 10_1101-436634 190 23 transcripts transcript NNS 10_1101-436634 190 24 , , , 10_1101-436634 190 25 and and CC 10_1101-436634 190 26 create create VB 10_1101-436634 190 27 novel novel JJ 10_1101-436634 190 28 exon exon NN 10_1101-436634 190 29 - - HYPH 10_1101-436634 190 30 exon exon NN 10_1101-436634 190 31 junctions junction NNS 10_1101-436634 190 32 that that WDT 10_1101-436634 190 33 have have VBP 10_1101-436634 190 34 not not RB 10_1101-436634 190 35 been be VBN 10_1101-436634 190 36 previously previously RB 10_1101-436634 190 37 described describe VBN 10_1101-436634 190 38 . . . 10_1101-436634 191 1 Specifically specifically RB 10_1101-436634 191 2 , , , 10_1101-436634 191 3 we -PRON- PRP 10_1101-436634 191 4 describe describe VBP 10_1101-436634 191 5 notable notable JJ 10_1101-436634 191 6 findings finding NNS 10_1101-436634 191 7 within within IN 10_1101-436634 191 8 B2 B2 NNP 10_1101-436634 191 9 M M NNP 10_1101-436634 191 10 , , , 10_1101-436634 191 11 RNF145 RNF145 NNP 10_1101-436634 191 12 , , , 10_1101-436634 191 13 and and CC 10_1101-436634 191 14 CDKN2A. CDKN2A. NNP 10_1101-436634 192 1 These these DT 10_1101-436634 192 2 results result NNS 10_1101-436634 192 3 demonstrate demonstrate VBP 10_1101-436634 192 4 the the DT 10_1101-436634 192 5 utility utility NN 10_1101-436634 192 6 of of IN 10_1101-436634 192 7 RegTools RegTools NNP 10_1101-436634 192 8 in in IN 10_1101-436634 192 9 discovering discover VBG 10_1101-436634 192 10 novel novel JJ 10_1101-436634 192 11 splice splice NN 10_1101-436634 192 12 - - HYPH 10_1101-436634 192 13 altering alter VBG 10_1101-436634 192 14 mutations mutation NNS 10_1101-436634 192 15 and and CC 10_1101-436634 192 16 confirm confirm VB 10_1101-436634 192 17 the the DT 10_1101-436634 192 18 importance importance NN 10_1101-436634 192 19 of of IN 10_1101-436634 192 20 integrating integrate VBG 10_1101-436634 192 21 RNA RNA NNP 10_1101-436634 192 22 and and CC 10_1101-436634 192 23 DNA DNA NNP 10_1101-436634 192 24 sequencing sequencing NN 10_1101-436634 192 25 data datum NNS 10_1101-436634 192 26 in in IN 10_1101-436634 192 27 understanding understand VBG 10_1101-436634 192 28 the the DT 10_1101-436634 192 29 consequences consequence NNS 10_1101-436634 192 30 of of IN 10_1101-436634 192 31 somatic somatic JJ 10_1101-436634 192 32 mutations mutation NNS 10_1101-436634 192 33 in in IN 10_1101-436634 192 34 cancer cancer NN 10_1101-436634 192 35 . . . 10_1101-436634 193 1 To to TO 10_1101-436634 193 2 allow allow VB 10_1101-436634 193 3 further further JJ 10_1101-436634 193 4 investigation investigation NN 10_1101-436634 193 5 of of IN 10_1101-436634 193 6 these these DT 10_1101-436634 193 7 identified identify VBN 10_1101-436634 193 8 events event NNS 10_1101-436634 193 9 , , , 10_1101-436634 193 10 we -PRON- PRP 10_1101-436634 193 11 make make VBP 10_1101-436634 193 12 all all DT 10_1101-436634 193 13 of of IN 10_1101-436634 193 14 our -PRON- PRP$ 10_1101-436634 193 15 annotated annotated JJ 10_1101-436634 193 16 result result NN 10_1101-436634 193 17 files file NNS 10_1101-436634 193 18 ( ( -LRB- 10_1101-436634 193 19 Supplemental supplemental JJ 10_1101-436634 193 20 Files Files NNP 10_1101-436634 193 21 1 1 CD 10_1101-436634 193 22 - - SYM 10_1101-436634 193 23 4 4 CD 10_1101-436634 193 24 ) ) -RRB- 10_1101-436634 193 25 and and CC 10_1101-436634 193 26 recurrence recurrence VB 10_1101-436634 193 27 analysis analysis NN 10_1101-436634 193 28 files file NNS 10_1101-436634 193 29 ( ( -LRB- 10_1101-436634 193 30 Supplemental supplemental JJ 10_1101-436634 193 31 Files Files NNP 10_1101-436634 193 32 5 5 CD 10_1101-436634 193 33 - - SYM 10_1101-436634 193 34 6 6 CD 10_1101-436634 193 35 ) ) -RRB- 10_1101-436634 193 36 available available JJ 10_1101-436634 193 37 . . . 10_1101-436634 194 1 Understanding understand VBG 10_1101-436634 194 2 the the DT 10_1101-436634 194 3 splicing splicing NN 10_1101-436634 194 4 landscape landscape NN 10_1101-436634 194 5 is be VBZ 10_1101-436634 194 6 crucial crucial JJ 10_1101-436634 194 7 for for IN 10_1101-436634 194 8 unlocking unlock VBG 10_1101-436634 194 9 potential potential JJ 10_1101-436634 194 10 therapeutic therapeutic JJ 10_1101-436634 194 11 avenues avenue NNS 10_1101-436634 194 12 in in IN 10_1101-436634 194 13 precision precision NN 10_1101-436634 194 14 medicine medicine NN 10_1101-436634 194 15 and and CC 10_1101-436634 194 16 elucidating elucidate VBG 10_1101-436634 194 17 the the DT 10_1101-436634 194 18 basic basic JJ 10_1101-436634 194 19 mechanisms mechanism NNS 10_1101-436634 194 20 of of IN 10_1101-436634 194 21 splicing splicing NN 10_1101-436634 194 22 . . . 10_1101-436634 195 1 The the DT 10_1101-436634 195 2 exploration exploration NN 10_1101-436634 195 3 of of IN 10_1101-436634 195 4 novel novel JJ 10_1101-436634 195 5 tumor tumor NN 10_1101-436634 195 6 - - HYPH 10_1101-436634 195 7 specific specific JJ 10_1101-436634 195 8 junctions junction NNS 10_1101-436634 195 9 will will MD 10_1101-436634 195 10 undoubtedly undoubtedly RB 10_1101-436634 195 11 lead lead VB 10_1101-436634 195 12 to to IN 10_1101-436634 195 13 translational translational JJ 10_1101-436634 195 14 applications application NNS 10_1101-436634 195 15 , , , 10_1101-436634 195 16 from from IN 10_1101-436634 195 17 discovering discover VBG 10_1101-436634 195 18 novel novel JJ 10_1101-436634 195 19 tumor tumor NN 10_1101-436634 195 20 drivers driver NNS 10_1101-436634 195 21 , , , 10_1101-436634 195 22 diagnostic diagnostic JJ 10_1101-436634 195 23 and and CC 10_1101-436634 195 24 prognostic prognostic JJ 10_1101-436634 195 25 biomarkers biomarker NNS 10_1101-436634 195 26 , , , 10_1101-436634 195 27 and and CC 10_1101-436634 195 28 drug drug NN 10_1101-436634 195 29 targets target NNS 10_1101-436634 195 30 , , , 10_1101-436634 195 31 to to IN 10_1101-436634 195 32 identifying identify VBG 10_1101-436634 195 33 a a DT 10_1101-436634 195 34 previously previously RB 10_1101-436634 195 35 untapped untapped JJ 10_1101-436634 195 36 source source NN 10_1101-436634 195 37 of of IN 10_1101-436634 195 38 neoantigens neoantigen NNS 10_1101-436634 195 39 for for IN 10_1101-436634 195 40 personalized personalized JJ 10_1101-436634 195 41 immunotherapy immunotherapy NN 10_1101-436634 195 42 . . . 10_1101-436634 196 1 While while IN 10_1101-436634 196 2 our -PRON- PRP$ 10_1101-436634 196 3 analysis analysis NN 10_1101-436634 196 4 .CC .CC NFP 10_1101-436634 196 5 - - : 10_1101-436634 196 6 BY by IN 10_1101-436634 196 7 - - HYPH 10_1101-436634 196 8 NC NC NNP 10_1101-436634 196 9 - - HYPH 10_1101-436634 196 10 ND ND NNP 10_1101-436634 196 11 4.0 4.0 CD 10_1101-436634 196 12 International international JJ 10_1101-436634 196 13 licensea licensea NNS 10_1101-436634 196 14 certified certify VBN 10_1101-436634 196 15 by by IN 10_1101-436634 196 16 peer peer NN 10_1101-436634 196 17 review review NN 10_1101-436634 196 18 ) ) -RRB- 10_1101-436634 196 19 is be VBZ 10_1101-436634 196 20 the the DT 10_1101-436634 196 21 author author NN 10_1101-436634 196 22 / / SYM 10_1101-436634 196 23 funder funder NN 10_1101-436634 196 24 , , , 10_1101-436634 196 25 who who WP 10_1101-436634 196 26 has have VBZ 10_1101-436634 196 27 granted grant VBN 10_1101-436634 196 28 bioRxiv biorxiv IN 10_1101-436634 196 29 a a DT 10_1101-436634 196 30 license license NN 10_1101-436634 196 31 to to TO 10_1101-436634 196 32 display display VB 10_1101-436634 196 33 the the DT 10_1101-436634 196 34 preprint preprint NN 10_1101-436634 196 35 in in IN 10_1101-436634 196 36 perpetuity perpetuity NN 10_1101-436634 196 37 . . . 10_1101-436634 197 1 It -PRON- PRP 10_1101-436634 197 2 is be VBZ 10_1101-436634 197 3 made make VBN 10_1101-436634 197 4 available available JJ 10_1101-436634 197 5 under under IN 10_1101-436634 197 6 The the DT 10_1101-436634 197 7 copyright copyright NN 10_1101-436634 197 8 holder holder NN 10_1101-436634 197 9 for for IN 10_1101-436634 197 10 this this DT 10_1101-436634 197 11 preprint preprint NN 10_1101-436634 197 12 ( ( -LRB- 10_1101-436634 197 13 which which WDT 10_1101-436634 197 14 was be VBD 10_1101-436634 197 15 notthis notthis DT 10_1101-436634 197 16 version version NN 10_1101-436634 197 17 posted post VBN 10_1101-436634 197 18 January January NNP 10_1101-436634 197 19 5 5 CD 10_1101-436634 197 20 , , , 10_1101-436634 197 21 2021 2021 CD 10_1101-436634 197 22 . . . 10_1101-436634 197 23 ; ; : 10_1101-436634 197 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 197 25 : : : 10_1101-436634 197 26 bioRxiv biorxiv VB 10_1101-436634 197 27 preprint preprint NN 10_1101-436634 197 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 197 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 197 30 10 10 CD 10_1101-436634 197 31 focuses focus NNS 10_1101-436634 197 32 on on IN 10_1101-436634 197 33 splice splice NN 10_1101-436634 197 34 altering altering NN 10_1101-436634 197 35 variants variant NNS 10_1101-436634 197 36 within within IN 10_1101-436634 197 37 cancers cancer NNS 10_1101-436634 197 38 , , , 10_1101-436634 197 39 we -PRON- PRP 10_1101-436634 197 40 believe believe VBP 10_1101-436634 197 41 RegTools RegTools NNP 10_1101-436634 197 42 will will MD 10_1101-436634 197 43 play play VB 10_1101-436634 197 44 an an DT 10_1101-436634 197 45 important important JJ 10_1101-436634 197 46 role role NN 10_1101-436634 197 47 in in IN 10_1101-436634 197 48 answering answer VBG 10_1101-436634 197 49 this this DT 10_1101-436634 197 50 broad broad JJ 10_1101-436634 197 51 range range NN 10_1101-436634 197 52 of of IN 10_1101-436634 197 53 questions question NNS 10_1101-436634 197 54 by by IN 10_1101-436634 197 55 helping help VBG 10_1101-436634 197 56 users user NNS 10_1101-436634 197 57 extract extract VB 10_1101-436634 197 58 splicing splice VBG 10_1101-436634 197 59 information information NN 10_1101-436634 197 60 from from IN 10_1101-436634 197 61 transcriptome transcriptome DT 10_1101-436634 197 62 data datum NNS 10_1101-436634 197 63 and and CC 10_1101-436634 197 64 linking link VBG 10_1101-436634 197 65 it -PRON- PRP 10_1101-436634 197 66 to to IN 10_1101-436634 197 67 somatic somatic VB 10_1101-436634 197 68 ( ( -LRB- 10_1101-436634 197 69 or or CC 10_1101-436634 197 70 germline germline NN 10_1101-436634 197 71 ) ) -RRB- 10_1101-436634 197 72 variant variant JJ 10_1101-436634 197 73 calls call NNS 10_1101-436634 197 74 . . . 10_1101-436634 198 1 The the DT 10_1101-436634 198 2 computational computational JJ 10_1101-436634 198 3 efficiency efficiency NN 10_1101-436634 198 4 of of IN 10_1101-436634 198 5 RegTools RegTools NNP 10_1101-436634 198 6 and and CC 10_1101-436634 198 7 increasing increase VBG 10_1101-436634 198 8 availability availability NN 10_1101-436634 198 9 and and CC 10_1101-436634 198 10 size size NN 10_1101-436634 198 11 of of IN 10_1101-436634 198 12 such such JJ 10_1101-436634 198 13 datasets dataset NNS 10_1101-436634 198 14 may may MD 10_1101-436634 198 15 also also RB 10_1101-436634 198 16 allow allow VB 10_1101-436634 198 17 for for IN 10_1101-436634 198 18 improved improved JJ 10_1101-436634 198 19 understanding understanding NN 10_1101-436634 198 20 of of IN 10_1101-436634 198 21 splice splice NN 10_1101-436634 198 22 regulatory regulatory JJ 10_1101-436634 198 23 motifs motif NNS 10_1101-436634 198 24 that that WDT 10_1101-436634 198 25 have have VBP 10_1101-436634 198 26 proven prove VBN 10_1101-436634 198 27 difficult difficult JJ 10_1101-436634 198 28 to to TO 10_1101-436634 198 29 accurately accurately RB 10_1101-436634 198 30 define define VB 10_1101-436634 198 31 such such JJ 10_1101-436634 198 32 as as IN 10_1101-436634 198 33 exonic exonic JJ 10_1101-436634 198 34 and and CC 10_1101-436634 198 35 intronic intronic JJ 10_1101-436634 198 36 splicing splicing NN 10_1101-436634 198 37 enhancers enhancer NNS 10_1101-436634 198 38 and and CC 10_1101-436634 198 39 silencers silencer NNS 10_1101-436634 198 40 . . . 10_1101-436634 199 1 Any any DT 10_1101-436634 199 2 group group NN 10_1101-436634 199 3 with with IN 10_1101-436634 199 4 paired pair VBN 10_1101-436634 199 5 DNA DNA NNP 10_1101-436634 199 6 and and CC 10_1101-436634 199 7 RNA RNA NNP 10_1101-436634 199 8 - - HYPH 10_1101-436634 199 9 seq seq NN 10_1101-436634 199 10 data datum NNS 10_1101-436634 199 11 for for IN 10_1101-436634 199 12 the the DT 10_1101-436634 199 13 same same JJ 10_1101-436634 199 14 samples sample NNS 10_1101-436634 199 15 stands stand VBZ 10_1101-436634 199 16 to to TO 10_1101-436634 199 17 benefit benefit VB 10_1101-436634 199 18 from from IN 10_1101-436634 199 19 the the DT 10_1101-436634 199 20 functionality functionality NN 10_1101-436634 199 21 of of IN 10_1101-436634 199 22 RegTools RegTools NNP 10_1101-436634 199 23 . . . 10_1101-436634 200 1 Methods method NNS 10_1101-436634 200 2 Software software NN 10_1101-436634 200 3 implementation implementation NN 10_1101-436634 200 4 RegTools RegTools NNP 10_1101-436634 200 5 is be VBZ 10_1101-436634 200 6 written write VBN 10_1101-436634 200 7 in in IN 10_1101-436634 200 8 C++ C++ NNP 10_1101-436634 200 9 . . . 10_1101-436634 201 1 CMake CMake NNP 10_1101-436634 201 2 is be VBZ 10_1101-436634 201 3 used use VBN 10_1101-436634 201 4 to to TO 10_1101-436634 201 5 build build VB 10_1101-436634 201 6 the the DT 10_1101-436634 201 7 executable executable JJ 10_1101-436634 201 8 from from IN 10_1101-436634 201 9 source source NN 10_1101-436634 201 10 code code NN 10_1101-436634 201 11 . . . 10_1101-436634 202 1 We -PRON- PRP 10_1101-436634 202 2 have have VBP 10_1101-436634 202 3 designed design VBN 10_1101-436634 202 4 the the DT 10_1101-436634 202 5 RegTools RegTools NNP 10_1101-436634 202 6 package package NN 10_1101-436634 202 7 to to TO 10_1101-436634 202 8 be be VB 10_1101-436634 202 9 self self NN 10_1101-436634 202 10 - - HYPH 10_1101-436634 202 11 contained contain VBN 10_1101-436634 202 12 in in IN 10_1101-436634 202 13 order order NN 10_1101-436634 202 14 to to TO 10_1101-436634 202 15 minimize minimize VB 10_1101-436634 202 16 external external JJ 10_1101-436634 202 17 software software NN 10_1101-436634 202 18 dependencies dependency NNS 10_1101-436634 202 19 . . . 10_1101-436634 203 1 A a DT 10_1101-436634 203 2 Unix Unix NNP 10_1101-436634 203 3 platform platform NN 10_1101-436634 203 4 with with IN 10_1101-436634 203 5 a a DT 10_1101-436634 203 6 C++ C++ NNP 10_1101-436634 203 7 compiler compiler NN 10_1101-436634 203 8 and and CC 10_1101-436634 203 9 CMake cmake NN 10_1101-436634 203 10 is be VBZ 10_1101-436634 203 11 the the DT 10_1101-436634 203 12 minimum minimum JJ 10_1101-436634 203 13 prerequisite prerequisite NN 10_1101-436634 203 14 for for IN 10_1101-436634 203 15 installing instal VBG 10_1101-436634 203 16 RegTools regtool NNS 10_1101-436634 203 17 . . . 10_1101-436634 204 1 Documentation documentation NN 10_1101-436634 204 2 for for IN 10_1101-436634 204 3 RegTools RegTools NNP 10_1101-436634 204 4 is be VBZ 10_1101-436634 204 5 maintained maintain VBN 10_1101-436634 204 6 as as IN 10_1101-436634 204 7 text text NN 10_1101-436634 204 8 files file NNS 10_1101-436634 204 9 within within IN 10_1101-436634 204 10 the the DT 10_1101-436634 204 11 source source NN 10_1101-436634 204 12 repository repository NN 10_1101-436634 204 13 to to TO 10_1101-436634 204 14 minimize minimize VB 10_1101-436634 204 15 divergence divergence NN 10_1101-436634 204 16 from from IN 10_1101-436634 204 17 the the DT 10_1101-436634 204 18 code code NN 10_1101-436634 204 19 . . . 10_1101-436634 205 1 We -PRON- PRP 10_1101-436634 205 2 have have VBP 10_1101-436634 205 3 implemented implement VBN 10_1101-436634 205 4 common common JJ 10_1101-436634 205 5 file file NN 10_1101-436634 205 6 handling handle VBG 10_1101-436634 205 7 tasks task NNS 10_1101-436634 205 8 in in IN 10_1101-436634 205 9 RegTools RegTools NNP 10_1101-436634 205 10 with with IN 10_1101-436634 205 11 the the DT 10_1101-436634 205 12 help help NN 10_1101-436634 205 13 of of IN 10_1101-436634 205 14 open open JJ 10_1101-436634 205 15 - - HYPH 10_1101-436634 205 16 source source NN 10_1101-436634 205 17 code code NN 10_1101-436634 205 18 from from IN 10_1101-436634 205 19 Samtools Samtools NNP 10_1101-436634 205 20 / / SYM 10_1101-436634 205 21 HTSlib26 HTSlib26 NNP 10_1101-436634 205 22 and and CC 10_1101-436634 205 23 BEDTools45 bedtools45 VB 10_1101-436634 205 24 in in IN 10_1101-436634 205 25 an an DT 10_1101-436634 205 26 effort effort NN 10_1101-436634 205 27 to to TO 10_1101-436634 205 28 ensure ensure VB 10_1101-436634 205 29 fast fast JJ 10_1101-436634 205 30 performance performance NN 10_1101-436634 205 31 , , , 10_1101-436634 205 32 consistent consistent JJ 10_1101-436634 205 33 file file NN 10_1101-436634 205 34 handling handling NN 10_1101-436634 205 35 , , , 10_1101-436634 205 36 and and CC 10_1101-436634 205 37 interoperability interoperability NN 10_1101-436634 205 38 with with IN 10_1101-436634 205 39 any any DT 10_1101-436634 205 40 aligner aligner NN 10_1101-436634 205 41 that that WDT 10_1101-436634 205 42 adheres adhere VBZ 10_1101-436634 205 43 to to IN 10_1101-436634 205 44 the the DT 10_1101-436634 205 45 BAM BAM NNP 10_1101-436634 205 46 specification specification NN 10_1101-436634 205 47 . . . 10_1101-436634 206 1 Statistical statistical JJ 10_1101-436634 206 2 tests test NNS 10_1101-436634 206 3 are be VBP 10_1101-436634 206 4 conducted conduct VBN 10_1101-436634 206 5 within within IN 10_1101-436634 206 6 RegTools RegTools NNP 10_1101-436634 206 7 using use VBG 10_1101-436634 206 8 the the DT 10_1101-436634 206 9 RMath RMath NNP 10_1101-436634 206 10 framework framework NN 10_1101-436634 206 11 . . . 10_1101-436634 207 1 Travis Travis NNP 10_1101-436634 207 2 CI CI NNP 10_1101-436634 207 3 and and CC 10_1101-436634 207 4 Coveralls Coveralls NNP 10_1101-436634 207 5 are be VBP 10_1101-436634 207 6 used use VBN 10_1101-436634 207 7 to to TO 10_1101-436634 207 8 automate automate VB 10_1101-436634 207 9 and and CC 10_1101-436634 207 10 monitor monitor VB 10_1101-436634 207 11 software software NN 10_1101-436634 207 12 compilation compilation NN 10_1101-436634 207 13 and and CC 10_1101-436634 207 14 unit unit NN 10_1101-436634 207 15 tests test NNS 10_1101-436634 207 16 to to TO 10_1101-436634 207 17 ensure ensure VB 10_1101-436634 207 18 software software NN 10_1101-436634 207 19 functionality functionality NN 10_1101-436634 207 20 . . . 10_1101-436634 208 1 We -PRON- PRP 10_1101-436634 208 2 utilized utilize VBD 10_1101-436634 208 3 the the DT 10_1101-436634 208 4 Google Google NNP 10_1101-436634 208 5 Test Test NNP 10_1101-436634 208 6 framework framework NN 10_1101-436634 208 7 to to TO 10_1101-436634 208 8 write write VB 10_1101-436634 208 9 unit unit NN 10_1101-436634 208 10 tests test NNS 10_1101-436634 208 11 . . . 10_1101-436634 209 1 RegTools RegTools NNP 10_1101-436634 209 2 consists consist VBZ 10_1101-436634 209 3 of of IN 10_1101-436634 209 4 a a DT 10_1101-436634 209 5 core core NN 10_1101-436634 209 6 set set NN 10_1101-436634 209 7 of of IN 10_1101-436634 209 8 modules module NNS 10_1101-436634 209 9 for for IN 10_1101-436634 209 10 variant variant JJ 10_1101-436634 209 11 annotation annotation NN 10_1101-436634 209 12 , , , 10_1101-436634 209 13 junction junction NN 10_1101-436634 209 14 extraction extraction NN 10_1101-436634 209 15 , , , 10_1101-436634 209 16 junction junction NN 10_1101-436634 209 17 annotation annotation NN 10_1101-436634 209 18 , , , 10_1101-436634 209 19 and and CC 10_1101-436634 209 20 GTF GTF NNP 10_1101-436634 209 21 utilities utility NNS 10_1101-436634 209 22 . . . 10_1101-436634 210 1 Higher high JJR 10_1101-436634 210 2 level level NN 10_1101-436634 210 3 modules module NNS 10_1101-436634 210 4 such such JJ 10_1101-436634 210 5 as as IN 10_1101-436634 210 6 cis cis NN 10_1101-436634 210 7 - - HYPH 10_1101-436634 210 8 splice splice NN 10_1101-436634 210 9 - - HYPH 10_1101-436634 210 10 effects effect NNS 10_1101-436634 210 11 make make VBP 10_1101-436634 210 12 use use NN 10_1101-436634 210 13 of of IN 10_1101-436634 210 14 the the DT 10_1101-436634 210 15 lower low JJR 10_1101-436634 210 16 level level NN 10_1101-436634 210 17 modules module NNS 10_1101-436634 210 18 to to TO 10_1101-436634 210 19 perform perform VB 10_1101-436634 210 20 more more RBR 10_1101-436634 210 21 complex complex JJ 10_1101-436634 210 22 analyses analysis NNS 10_1101-436634 210 23 . . . 10_1101-436634 211 1 We -PRON- PRP 10_1101-436634 211 2 hope hope VBP 10_1101-436634 211 3 that that IN 10_1101-436634 211 4 bioinformaticians bioinformatician NNS 10_1101-436634 211 5 familiar familiar JJ 10_1101-436634 211 6 with with IN 10_1101-436634 211 7 C C NNP 10_1101-436634 211 8 / / SYM 10_1101-436634 211 9 C++ C++ NNP 10_1101-436634 211 10 can can MD 10_1101-436634 211 11 re re VB 10_1101-436634 211 12 - - VB 10_1101-436634 211 13 use use VB 10_1101-436634 211 14 or or CC 10_1101-436634 211 15 adapt adapt VB 10_1101-436634 211 16 the the DT 10_1101-436634 211 17 RegTools RegTools NNP 10_1101-436634 211 18 code code NN 10_1101-436634 211 19 to to TO 10_1101-436634 211 20 implement implement VB 10_1101-436634 211 21 similar similar JJ 10_1101-436634 211 22 tasks task NNS 10_1101-436634 211 23 . . . 10_1101-436634 212 1 Benchmarking benchmarke VBG 10_1101-436634 212 2 Performance Performance NNP 10_1101-436634 212 3 metrics metric NNS 10_1101-436634 212 4 were be VBD 10_1101-436634 212 5 calculated calculate VBN 10_1101-436634 212 6 for for IN 10_1101-436634 212 7 all all DT 10_1101-436634 212 8 RegTools RegTools NNP 10_1101-436634 212 9 commands command NNS 10_1101-436634 212 10 . . . 10_1101-436634 213 1 Each each DT 10_1101-436634 213 2 command command NN 10_1101-436634 213 3 was be VBD 10_1101-436634 213 4 run run VBN 10_1101-436634 213 5 with with IN 10_1101-436634 213 6 default default NN 10_1101-436634 213 7 parameters parameter NNS 10_1101-436634 213 8 on on IN 10_1101-436634 213 9 a a DT 10_1101-436634 213 10 single single JJ 10_1101-436634 213 11 blade blade NN 10_1101-436634 213 12 server server NN 10_1101-436634 213 13 ( ( -LRB- 10_1101-436634 213 14 Intel(R intel(r NN 10_1101-436634 213 15 ) ) -RRB- 10_1101-436634 213 16 Xeon(R Xeon(R NNP 10_1101-436634 213 17 ) ) -RRB- 10_1101-436634 213 18 CPU CPU NNP 10_1101-436634 213 19 E5 E5 NNP 10_1101-436634 213 20 - - HYPH 10_1101-436634 213 21 2660 2660 CD 10_1101-436634 213 22 v2 v2 NN 10_1101-436634 213 23 @ @ CC 10_1101-436634 213 24 2.20GHz 2.20ghz NN 10_1101-436634 213 25 ) ) -RRB- 10_1101-436634 213 26 with with IN 10_1101-436634 213 27 10 10 CD 10_1101-436634 213 28 GB GB NNS 10_1101-436634 213 29 of of IN 10_1101-436634 213 30 RAM ram NN 10_1101-436634 213 31 and and CC 10_1101-436634 213 32 10 10 CD 10_1101-436634 213 33 replicates replicate NNS 10_1101-436634 213 34 for for IN 10_1101-436634 213 35 each each DT 10_1101-436634 213 36 data data NN 10_1101-436634 213 37 point point NN 10_1101-436634 213 38 ( ( -LRB- 10_1101-436634 213 39 Supplementary Supplementary NNP 10_1101-436634 213 40 Figure Figure NNP 10_1101-436634 213 41 1 1 CD 10_1101-436634 213 42 ) ) -RRB- 10_1101-436634 213 43 . . . 10_1101-436634 214 1 Specifically specifically RB 10_1101-436634 214 2 for for IN 10_1101-436634 214 3 cis cis NN 10_1101-436634 214 4 - - HYPH 10_1101-436634 214 5 splice splice NN 10_1101-436634 214 6 - - HYPH 10_1101-436634 214 7 effects effect NNS 10_1101-436634 214 8 identify identify VBP 10_1101-436634 214 9 , , , 10_1101-436634 214 10 we -PRON- PRP 10_1101-436634 214 11 started start VBD 10_1101-436634 214 12 with with IN 10_1101-436634 214 13 random random JJ 10_1101-436634 214 14 selections selection NNS 10_1101-436634 214 15 of of IN 10_1101-436634 214 16 somatic somatic JJ 10_1101-436634 214 17 variants variant NNS 10_1101-436634 214 18 , , , 10_1101-436634 214 19 ranging range VBG 10_1101-436634 214 20 from from IN 10_1101-436634 214 21 10,000 10,000 CD 10_1101-436634 214 22 - - SYM 10_1101-436634 214 23 1,500,000 1,500,000 CD 10_1101-436634 214 24 , , , 10_1101-436634 214 25 across across IN 10_1101-436634 214 26 8 8 CD 10_1101-436634 214 27 data data NN 10_1101-436634 214 28 subsets subset NNS 10_1101-436634 214 29 . . . 10_1101-436634 215 1 Using use VBG 10_1101-436634 215 2 the the DT 10_1101-436634 215 3 output output NN 10_1101-436634 215 4 from from IN 10_1101-436634 215 5 cis cis NN 10_1101-436634 215 6 - - HYPH 10_1101-436634 215 7 splice splice NN 10_1101-436634 215 8 - - HYPH 10_1101-436634 215 9 effects effect NNS 10_1101-436634 215 10 identify identify VBP 10_1101-436634 215 11 , , , 10_1101-436634 215 12 variants variant NNS 10_1101-436634 215 13 annotate annotate JJ 10_1101-436634 215 14 was be VBD 10_1101-436634 215 15 run run VBN 10_1101-436634 215 16 on on IN 10_1101-436634 215 17 somatic somatic JJ 10_1101-436634 215 18 variants variant NNS 10_1101-436634 215 19 from from IN 10_1101-436634 215 20 the the DT 10_1101-436634 215 21 8 8 CD 10_1101-436634 215 22 subsets subset NNS 10_1101-436634 215 23 ( ( -LRB- 10_1101-436634 215 24 range range NN 10_1101-436634 215 25 : : : 10_1101-436634 215 26 0 0 CD 10_1101-436634 215 27 - - SYM 10_1101-436634 215 28 17,742 17,742 CD 10_1101-436634 215 29 ) ) -RRB- 10_1101-436634 215 30 predicted predict VBD 10_1101-436634 215 31 to to TO 10_1101-436634 215 32 have have VB 10_1101-436634 215 33 a a DT 10_1101-436634 215 34 splicing splicing NN 10_1101-436634 215 35 consequence consequence NN 10_1101-436634 215 36 . . . 10_1101-436634 216 1 The the DT 10_1101-436634 216 2 function function NN 10_1101-436634 216 3 junctions junction NNS 10_1101-436634 216 4 extract extract NN 10_1101-436634 216 5 was be VBD 10_1101-436634 216 6 performed perform VBN 10_1101-436634 216 7 on on IN 10_1101-436634 216 8 the the DT 10_1101-436634 216 9 HCC1395 HCC1395 NNP 10_1101-436634 216 10 tumor tumor NN 10_1101-436634 216 11 RNA RNA NNP 10_1101-436634 216 12 - - HYPH 10_1101-436634 216 13 seq seq NN 10_1101-436634 216 14 data datum NNS 10_1101-436634 216 15 aligned align VBN 10_1101-436634 216 16 with with IN 10_1101-436634 216 17 HISAT HISAT NNP 10_1101-436634 216 18 to to IN 10_1101-436634 216 19 GRCh37 GRCh37 NNP 10_1101-436634 216 20 and and CC 10_1101-436634 216 21 randomly randomly RB 10_1101-436634 216 22 downsampled downsample VBN 10_1101-436634 216 23 at at IN 10_1101-436634 216 24 intervals interval NNS 10_1101-436634 216 25 ranging range VBG 10_1101-436634 216 26 from from IN 10_1101-436634 216 27 10 10 CD 10_1101-436634 216 28 - - SYM 10_1101-436634 216 29 100 100 CD 10_1101-436634 216 30 % % NN 10_1101-436634 216 31 . . . 10_1101-436634 217 1 Using use VBG 10_1101-436634 217 2 output output NN 10_1101-436634 217 3 from from IN 10_1101-436634 217 4 junctions junction NNS 10_1101-436634 217 5 extract extract NN 10_1101-436634 217 6 , , , 10_1101-436634 217 7 junctions junction NNS 10_1101-436634 217 8 annotate annotate NN 10_1101-436634 217 9 was be VBD 10_1101-436634 217 10 performed perform VBN 10_1101-436634 217 11 for for IN 10_1101-436634 217 12 7 7 CD 10_1101-436634 217 13 data datum NNS 10_1101-436634 217 14 subsets subset NNS 10_1101-436634 217 15 ranging range VBG 10_1101-436634 217 16 from from IN 10_1101-436634 217 17 1,000 1,000 CD 10_1101-436634 217 18 - - SYM 10_1101-436634 217 19 500,000 500,000 CD 10_1101-436634 217 20 randomly randomly RB 10_1101-436634 217 21 selected select VBN 10_1101-436634 217 22 junctions junction NNS 10_1101-436634 217 23 . . . 10_1101-436634 218 1 .CC .CC NFP 10_1101-436634 218 2 - - : 10_1101-436634 218 3 BY by IN 10_1101-436634 218 4 - - HYPH 10_1101-436634 218 5 NC NC NNP 10_1101-436634 218 6 - - HYPH 10_1101-436634 218 7 ND ND NNP 10_1101-436634 218 8 4.0 4.0 CD 10_1101-436634 218 9 International international JJ 10_1101-436634 218 10 licensea licensea NNS 10_1101-436634 218 11 certified certify VBN 10_1101-436634 218 12 by by IN 10_1101-436634 218 13 peer peer NN 10_1101-436634 218 14 review review NN 10_1101-436634 218 15 ) ) -RRB- 10_1101-436634 218 16 is be VBZ 10_1101-436634 218 17 the the DT 10_1101-436634 218 18 author author NN 10_1101-436634 218 19 / / SYM 10_1101-436634 218 20 funder funder NN 10_1101-436634 218 21 , , , 10_1101-436634 218 22 who who WP 10_1101-436634 218 23 has have VBZ 10_1101-436634 218 24 granted grant VBN 10_1101-436634 218 25 bioRxiv biorxiv IN 10_1101-436634 218 26 a a DT 10_1101-436634 218 27 license license NN 10_1101-436634 218 28 to to TO 10_1101-436634 218 29 display display VB 10_1101-436634 218 30 the the DT 10_1101-436634 218 31 preprint preprint NN 10_1101-436634 218 32 in in IN 10_1101-436634 218 33 perpetuity perpetuity NN 10_1101-436634 218 34 . . . 10_1101-436634 219 1 It -PRON- PRP 10_1101-436634 219 2 is be VBZ 10_1101-436634 219 3 made make VBN 10_1101-436634 219 4 available available JJ 10_1101-436634 219 5 under under IN 10_1101-436634 219 6 The the DT 10_1101-436634 219 7 copyright copyright NN 10_1101-436634 219 8 holder holder NN 10_1101-436634 219 9 for for IN 10_1101-436634 219 10 this this DT 10_1101-436634 219 11 preprint preprint NN 10_1101-436634 219 12 ( ( -LRB- 10_1101-436634 219 13 which which WDT 10_1101-436634 219 14 was be VBD 10_1101-436634 219 15 notthis notthis DT 10_1101-436634 219 16 version version NN 10_1101-436634 219 17 posted post VBN 10_1101-436634 219 18 January January NNP 10_1101-436634 219 19 5 5 CD 10_1101-436634 219 20 , , , 10_1101-436634 219 21 2021 2021 CD 10_1101-436634 219 22 . . . 10_1101-436634 219 23 ; ; : 10_1101-436634 219 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 219 25 : : : 10_1101-436634 219 26 bioRxiv biorxiv VB 10_1101-436634 219 27 preprint preprint NN 10_1101-436634 219 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 219 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 219 30 11 11 CD 10_1101-436634 219 31 Benchmark Benchmark NNP 10_1101-436634 219 32 tests test NNS 10_1101-436634 219 33 revealed reveal VBD 10_1101-436634 219 34 an an DT 10_1101-436634 219 35 approximately approximately RB 10_1101-436634 219 36 linear linear JJ 10_1101-436634 219 37 performance performance NN 10_1101-436634 219 38 for for IN 10_1101-436634 219 39 all all DT 10_1101-436634 219 40 functions function NNS 10_1101-436634 219 41 . . . 10_1101-436634 220 1 Variance variance NN 10_1101-436634 220 2 between between IN 10_1101-436634 220 3 real real JJ 10_1101-436634 220 4 and and CC 10_1101-436634 220 5 CPU cpu NN 10_1101-436634 220 6 time time NN 10_1101-436634 220 7 is be VBZ 10_1101-436634 220 8 highly highly RB 10_1101-436634 220 9 dependent dependent JJ 10_1101-436634 220 10 on on IN 10_1101-436634 220 11 the the DT 10_1101-436634 220 12 I i NN 10_1101-436634 220 13 / / SYM 10_1101-436634 220 14 O o UH 10_1101-436634 220 15 speed speed NN 10_1101-436634 220 16 of of IN 10_1101-436634 220 17 the the DT 10_1101-436634 220 18 write write JJ 10_1101-436634 220 19 - - HYPH 10_1101-436634 220 20 disk disk NN 10_1101-436634 220 21 and and CC 10_1101-436634 220 22 could could MD 10_1101-436634 220 23 account account VB 10_1101-436634 220 24 for for IN 10_1101-436634 220 25 artificially artificially RB 10_1101-436634 220 26 inflated inflate VBN 10_1101-436634 220 27 real real JJ 10_1101-436634 220 28 time time NN 10_1101-436634 220 29 values value NNS 10_1101-436634 220 30 given give VBN 10_1101-436634 220 31 multiple multiple JJ 10_1101-436634 220 32 jobs job NNS 10_1101-436634 220 33 writing write VBG 10_1101-436634 220 34 to to IN 10_1101-436634 220 35 the the DT 10_1101-436634 220 36 same same JJ 10_1101-436634 220 37 disk disk NN 10_1101-436634 220 38 at at IN 10_1101-436634 220 39 once once RB 10_1101-436634 220 40 . . . 10_1101-436634 221 1 The the DT 10_1101-436634 221 2 most most RBS 10_1101-436634 221 3 computationally computationally RB 10_1101-436634 221 4 expensive expensive JJ 10_1101-436634 221 5 function function NN 10_1101-436634 221 6 in in IN 10_1101-436634 221 7 a a DT 10_1101-436634 221 8 typical typical JJ 10_1101-436634 221 9 analysis analysis NN 10_1101-436634 221 10 workflow workflow NN 10_1101-436634 221 11 was be VBD 10_1101-436634 221 12 junctions junction NNS 10_1101-436634 221 13 extract extract NNP 10_1101-436634 221 14 , , , 10_1101-436634 221 15 which which WDT 10_1101-436634 221 16 on on IN 10_1101-436634 221 17 average average JJ 10_1101-436634 221 18 processed process VBD 10_1101-436634 221 19 33,091 33,091 CD 10_1101-436634 221 20 reads read NNS 10_1101-436634 221 21 / / SYM 10_1101-436634 221 22 second second NN 10_1101-436634 221 23 ( ( -LRB- 10_1101-436634 221 24 CPU cpu NN 10_1101-436634 221 25 ) ) -RRB- 10_1101-436634 221 26 and and CC 10_1101-436634 221 27 took take VBD 10_1101-436634 221 28 an an DT 10_1101-436634 221 29 average average NN 10_1101-436634 221 30 of of IN 10_1101-436634 221 31 43.4 43.4 CD 10_1101-436634 221 32 real real JJ 10_1101-436634 221 33 vs vs IN 10_1101-436634 221 34 41.7 41.7 CD 10_1101-436634 221 35 CPU cpu NN 10_1101-436634 221 36 minutes minute NNS 10_1101-436634 221 37 to to TO 10_1101-436634 221 38 run run VB 10_1101-436634 221 39 on on IN 10_1101-436634 221 40 a a DT 10_1101-436634 221 41 full full JJ 10_1101-436634 221 42 bam bam NN 10_1101-436634 221 43 file file NN 10_1101-436634 221 44 ( ( -LRB- 10_1101-436634 221 45 82,807,868 82,807,868 CD 10_1101-436634 221 46 reads read NNS 10_1101-436634 221 47 total total NN 10_1101-436634 221 48 ) ) -RRB- 10_1101-436634 221 49 . . . 10_1101-436634 222 1 The the DT 10_1101-436634 222 2 function function NN 10_1101-436634 222 3 junctions junction NNS 10_1101-436634 222 4 annotate annotate NN 10_1101-436634 222 5 was be VBD 10_1101-436634 222 6 the the DT 10_1101-436634 222 7 next next JJ 10_1101-436634 222 8 most most RBS 10_1101-436634 222 9 computationally computationally RB 10_1101-436634 222 10 intensive intensive JJ 10_1101-436634 222 11 function function NN 10_1101-436634 222 12 and and CC 10_1101-436634 222 13 took take VBD 10_1101-436634 222 14 an an DT 10_1101-436634 222 15 average average NN 10_1101-436634 222 16 of of IN 10_1101-436634 222 17 33.0 33.0 CD 10_1101-436634 222 18 real/8.55 real/8.55 JJ 10_1101-436634 222 19 CPU cpu NN 10_1101-436634 222 20 minutes minute NNS 10_1101-436634 222 21 to to TO 10_1101-436634 222 22 run run VB 10_1101-436634 222 23 on on IN 10_1101-436634 222 24 500,000 500,000 CD 10_1101-436634 222 25 junctions junction NNS 10_1101-436634 222 26 , , , 10_1101-436634 222 27 processing process VBG 10_1101-436634 222 28 975 975 CD 10_1101-436634 222 29 junctions junction NNS 10_1101-436634 222 30 / / SYM 10_1101-436634 222 31 second second JJ 10_1101-436634 222 32 ( ( -LRB- 10_1101-436634 222 33 CPU cpu NN 10_1101-436634 222 34 ) ) -RRB- 10_1101-436634 222 35 . . . 10_1101-436634 223 1 The the DT 10_1101-436634 223 2 other other JJ 10_1101-436634 223 3 functions function NNS 10_1101-436634 223 4 were be VBD 10_1101-436634 223 5 comparatively comparatively RB 10_1101-436634 223 6 faster fast JJR 10_1101-436634 223 7 with with IN 10_1101-436634 223 8 cis cis NN 10_1101-436634 223 9 - - HYPH 10_1101-436634 223 10 splice splice NN 10_1101-436634 223 11 - - HYPH 10_1101-436634 223 12 effects effect NNS 10_1101-436634 223 13 identify identify VBP 10_1101-436634 223 14 and and CC 10_1101-436634 223 15 variants variant NNS 10_1101-436634 223 16 annotate annotate VBP 10_1101-436634 223 17 able able JJ 10_1101-436634 223 18 to to TO 10_1101-436634 223 19 process process VB 10_1101-436634 223 20 3,105 3,105 CD 10_1101-436634 223 21 and and CC 10_1101-436634 223 22 118 118 CD 10_1101-436634 223 23 variants variant NNS 10_1101-436634 223 24 per per IN 10_1101-436634 223 25 second second NN 10_1101-436634 223 26 ( ( -LRB- 10_1101-436634 223 27 CPU cpu NN 10_1101-436634 223 28 ) ) -RRB- 10_1101-436634 223 29 , , , 10_1101-436634 223 30 respectively respectively RB 10_1101-436634 223 31 . . . 10_1101-436634 224 1 To to TO 10_1101-436634 224 2 process process VB 10_1101-436634 224 3 a a DT 10_1101-436634 224 4 typical typical JJ 10_1101-436634 224 5 candidate candidate NN 10_1101-436634 224 6 variant variant JJ 10_1101-436634 224 7 list list NN 10_1101-436634 224 8 of of IN 10_1101-436634 224 9 1,500,000 1,500,000 CD 10_1101-436634 224 10 variants variant NNS 10_1101-436634 224 11 and and CC 10_1101-436634 224 12 a a DT 10_1101-436634 224 13 corresponding corresponding JJ 10_1101-436634 224 14 RNA RNA NNP 10_1101-436634 224 15 - - HYPH 10_1101-436634 224 16 seq seq NN 10_1101-436634 224 17 BAM BAM NNP 10_1101-436634 224 18 file file NN 10_1101-436634 224 19 of of IN 10_1101-436634 224 20 82,807,868 82,807,868 CD 10_1101-436634 224 21 reads read NNS 10_1101-436634 224 22 with with IN 10_1101-436634 224 23 cis cis NN 10_1101-436634 224 24 - - HYPH 10_1101-436634 224 25 splice splice NN 10_1101-436634 224 26 - - HYPH 10_1101-436634 224 27 effects effect NNS 10_1101-436634 224 28 identify identify VBP 10_1101-436634 224 29 takes take VBZ 10_1101-436634 224 30 ~ ~ NFP 10_1101-436634 224 31 8.20 8.20 CD 10_1101-436634 224 32 real/8.05 real/8.05 JJ 10_1101-436634 224 33 CPU cpu NN 10_1101-436634 224 34 minutes minute NNS 10_1101-436634 224 35 ( ( -LRB- 10_1101-436634 224 36 Supplementary Supplementary NNP 10_1101-436634 224 37 Figure Figure NNP 10_1101-436634 224 38 1 1 CD 10_1101-436634 224 39 ) ) -RRB- 10_1101-436634 224 40 . . . 10_1101-436634 225 1 Performance performance NN 10_1101-436634 225 2 metrics metric NNS 10_1101-436634 225 3 were be VBD 10_1101-436634 225 4 also also RB 10_1101-436634 225 5 calculated calculate VBN 10_1101-436634 225 6 for for IN 10_1101-436634 225 7 the the DT 10_1101-436634 225 8 statistics statistic NNS 10_1101-436634 225 9 script script NN 10_1101-436634 225 10 and and CC 10_1101-436634 225 11 its -PRON- PRP$ 10_1101-436634 225 12 associated associated NNP 10_1101-436634 225 13 wrapper wrapper NNP 10_1101-436634 225 14 script script NN 10_1101-436634 225 15 that that WDT 10_1101-436634 225 16 handles handle NNS 10_1101-436634 225 17 dividing divide VBG 10_1101-436634 225 18 the the DT 10_1101-436634 225 19 variants variant NNS 10_1101-436634 225 20 into into IN 10_1101-436634 225 21 smaller small JJR 10_1101-436634 225 22 chunks chunk NNS 10_1101-436634 225 23 for for IN 10_1101-436634 225 24 processing processing NN 10_1101-436634 225 25 to to TO 10_1101-436634 225 26 limit limit VB 10_1101-436634 225 27 RAM ram NN 10_1101-436634 225 28 usage usage NN 10_1101-436634 225 29 . . . 10_1101-436634 226 1 This this DT 10_1101-436634 226 2 command command NN 10_1101-436634 226 3 , , , 10_1101-436634 226 4 compare_junctions compare_junction NNS 10_1101-436634 226 5 , , , 10_1101-436634 226 6 was be VBD 10_1101-436634 226 7 benchmarked benchmarke VBN 10_1101-436634 226 8 in in IN 10_1101-436634 226 9 January January NNP 10_1101-436634 226 10 2020 2020 CD 10_1101-436634 226 11 using use VBG 10_1101-436634 226 12 Amazon Amazon NNP 10_1101-436634 226 13 Web Web NNP 10_1101-436634 226 14 Services Services NNPS 10_1101-436634 226 15 ( ( -LRB- 10_1101-436634 226 16 AWS AWS NNP 10_1101-436634 226 17 ) ) -RRB- 10_1101-436634 226 18 on on IN 10_1101-436634 226 19 a a DT 10_1101-436634 226 20 m5.4xlarge m5.4xlarge NNP 10_1101-436634 226 21 instance instance NN 10_1101-436634 226 22 , , , 10_1101-436634 226 23 based base VBN 10_1101-436634 226 24 on on IN 10_1101-436634 226 25 the the DT 10_1101-436634 226 26 Amazon Amazon NNP 10_1101-436634 226 27 Linux Linux NNP 10_1101-436634 226 28 2 2 CD 10_1101-436634 226 29 AMI AMI NNP 10_1101-436634 226 30 , , , 10_1101-436634 226 31 with with IN 10_1101-436634 226 32 64 64 CD 10_1101-436634 226 33 Gb Gb NNS 10_1101-436634 226 34 of of IN 10_1101-436634 226 35 RAM ram NN 10_1101-436634 226 36 , , , 10_1101-436634 226 37 16 16 CD 10_1101-436634 226 38 vCPUs vcpu NNS 10_1101-436634 226 39 , , , 10_1101-436634 226 40 and and CC 10_1101-436634 226 41 a a DT 10_1101-436634 226 42 mounted mount VBN 10_1101-436634 226 43 1 1 CD 10_1101-436634 226 44 TB TB NNP 10_1101-436634 226 45 SSD SSD NNP 10_1101-436634 226 46 EBS EBS NNP 10_1101-436634 226 47 volume volume NN 10_1101-436634 226 48 with with IN 10_1101-436634 226 49 3000 3000 CD 10_1101-436634 226 50 IOPS IOPS NNP 10_1101-436634 226 51 . . . 10_1101-436634 227 1 These these DT 10_1101-436634 227 2 data datum NNS 10_1101-436634 227 3 were be VBD 10_1101-436634 227 4 generated generate VBN 10_1101-436634 227 5 from from IN 10_1101-436634 227 6 running run VBG 10_1101-436634 227 7 compare_junctions compare_junction NNS 10_1101-436634 227 8 on on IN 10_1101-436634 227 9 each each DT 10_1101-436634 227 10 of of IN 10_1101-436634 227 11 the the DT 10_1101-436634 227 12 included included JJ 10_1101-436634 227 13 cohorts cohort NNS 10_1101-436634 227 14 , , , 10_1101-436634 227 15 with with IN 10_1101-436634 227 16 the the DT 10_1101-436634 227 17 largest large JJS 10_1101-436634 227 18 being be VBG 10_1101-436634 227 19 our -PRON- PRP$ 10_1101-436634 227 20 BRCA BRCA NNP 10_1101-436634 227 21 cohort cohort NN 10_1101-436634 227 22 ( ( -LRB- 10_1101-436634 227 23 1022 1022 CD 10_1101-436634 227 24 sample sample NN 10_1101-436634 227 25 ) ) -RRB- 10_1101-436634 227 26 which which WDT 10_1101-436634 227 27 processed process VBD 10_1101-436634 227 28 3.64 3.64 CD 10_1101-436634 227 29 events event NNS 10_1101-436634 227 30 per per IN 10_1101-436634 227 31 second second NN 10_1101-436634 227 32 ( ( -LRB- 10_1101-436634 227 33 CPU cpu NN 10_1101-436634 227 34 ) ) -RRB- 10_1101-436634 227 35 . . . 10_1101-436634 228 1 Using use VBG 10_1101-436634 228 2 RegTools RegTools NNP 10_1101-436634 228 3 to to TO 10_1101-436634 228 4 identify identify VB 10_1101-436634 228 5 cis cis NN 10_1101-436634 228 6 - - HYPH 10_1101-436634 228 7 acting acting NN 10_1101-436634 228 8 , , , 10_1101-436634 228 9 splice splice NN 10_1101-436634 228 10 altering altering NN 10_1101-436634 228 11 variants variant NNS 10_1101-436634 228 12 RegTools RegTools NNP 10_1101-436634 228 13 contains contain VBZ 10_1101-436634 228 14 three three CD 10_1101-436634 228 15 sub sub NN 10_1101-436634 228 16 - - HYPH 10_1101-436634 228 17 modules module NNS 10_1101-436634 228 18 : : : 10_1101-436634 228 19 “ " `` 10_1101-436634 228 20 variants variant NNS 10_1101-436634 228 21 ” " '' 10_1101-436634 228 22 , , , 10_1101-436634 228 23 “ " `` 10_1101-436634 228 24 junctions junction NNS 10_1101-436634 228 25 ” " '' 10_1101-436634 228 26 , , , 10_1101-436634 228 27 and and CC 10_1101-436634 228 28 “ " `` 10_1101-436634 228 29 cis cis NN 10_1101-436634 228 30 - - HYPH 10_1101-436634 228 31 splice splice NN 10_1101-436634 228 32 - - HYPH 10_1101-436634 228 33 effects effect NNS 10_1101-436634 228 34 ” " '' 10_1101-436634 228 35 . . . 10_1101-436634 229 1 For for IN 10_1101-436634 229 2 complete complete JJ 10_1101-436634 229 3 instructions instruction NNS 10_1101-436634 229 4 on on IN 10_1101-436634 229 5 usage usage NN 10_1101-436634 229 6 , , , 10_1101-436634 229 7 including include VBG 10_1101-436634 229 8 a a DT 10_1101-436634 229 9 detailed detailed JJ 10_1101-436634 229 10 workflow workflow NN 10_1101-436634 229 11 for for IN 10_1101-436634 229 12 how how WRB 10_1101-436634 229 13 to to TO 10_1101-436634 229 14 analyze analyze VB 10_1101-436634 229 15 cohorts cohort NNS 10_1101-436634 229 16 using use VBG 10_1101-436634 229 17 RegTools RegTools NNP 10_1101-436634 229 18 , , , 10_1101-436634 229 19 please please UH 10_1101-436634 229 20 visit visit VB 10_1101-436634 229 21 regtools.org regtools.org ADD 10_1101-436634 229 22 . . . 10_1101-436634 230 1 Variants variant NNS 10_1101-436634 230 2 annotate annotate VBP 10_1101-436634 230 3 This this DT 10_1101-436634 230 4 command command NN 10_1101-436634 230 5 takes take VBZ 10_1101-436634 230 6 a a DT 10_1101-436634 230 7 list list NN 10_1101-436634 230 8 of of IN 10_1101-436634 230 9 variants variant NNS 10_1101-436634 230 10 in in IN 10_1101-436634 230 11 VCF VCF NNP 10_1101-436634 230 12 format format NN 10_1101-436634 230 13 . . . 10_1101-436634 231 1 The the DT 10_1101-436634 231 2 file file NN 10_1101-436634 231 3 should should MD 10_1101-436634 231 4 be be VB 10_1101-436634 231 5 gzipped gzippe VBN 10_1101-436634 231 6 and and CC 10_1101-436634 231 7 indexed index VBN 10_1101-436634 231 8 with with IN 10_1101-436634 231 9 Tabix46 Tabix46 NNP 10_1101-436634 231 10 . . . 10_1101-436634 232 1 The the DT 10_1101-436634 232 2 user user NN 10_1101-436634 232 3 must must MD 10_1101-436634 232 4 also also RB 10_1101-436634 232 5 supply supply VB 10_1101-436634 232 6 a a DT 10_1101-436634 232 7 GTF gtf NN 10_1101-436634 232 8 file file NN 10_1101-436634 232 9 that that WDT 10_1101-436634 232 10 specifies specify VBZ 10_1101-436634 232 11 the the DT 10_1101-436634 232 12 reference reference NN 10_1101-436634 232 13 transcriptome transcriptome DT 10_1101-436634 232 14 used use VBN 10_1101-436634 232 15 to to TO 10_1101-436634 232 16 annotate annotate VB 10_1101-436634 232 17 the the DT 10_1101-436634 232 18 variants variant NNS 10_1101-436634 232 19 . . . 10_1101-436634 233 1 The the DT 10_1101-436634 233 2 INFO INFO NNP 10_1101-436634 233 3 column column NN 10_1101-436634 233 4 of of IN 10_1101-436634 233 5 each each DT 10_1101-436634 233 6 line line NN 10_1101-436634 233 7 in in IN 10_1101-436634 233 8 the the DT 10_1101-436634 233 9 VCF VCF NNP 10_1101-436634 233 10 is be VBZ 10_1101-436634 233 11 populated populate VBN 10_1101-436634 233 12 with with IN 10_1101-436634 233 13 comma comma NN 10_1101-436634 233 14 - - HYPH 10_1101-436634 233 15 separated separate VBN 10_1101-436634 233 16 lists list NNS 10_1101-436634 233 17 of of IN 10_1101-436634 233 18 the the DT 10_1101-436634 233 19 variant variant JJ 10_1101-436634 233 20 - - HYPH 10_1101-436634 233 21 overlapping overlap VBG 10_1101-436634 233 22 genes gene NNS 10_1101-436634 233 23 , , , 10_1101-436634 233 24 variant variant JJ 10_1101-436634 233 25 - - HYPH 10_1101-436634 233 26 overlapping overlapping NN 10_1101-436634 233 27 transcripts transcript NNS 10_1101-436634 233 28 , , , 10_1101-436634 233 29 the the DT 10_1101-436634 233 30 distance distance NN 10_1101-436634 233 31 between between IN 10_1101-436634 233 32 the the DT 10_1101-436634 233 33 variant variant NN 10_1101-436634 233 34 and and CC 10_1101-436634 233 35 the the DT 10_1101-436634 233 36 associated associate VBN 10_1101-436634 233 37 exon exon NNP 10_1101-436634 233 38 edge edge NN 10_1101-436634 233 39 for for IN 10_1101-436634 233 40 each each DT 10_1101-436634 233 41 transcript transcript NN 10_1101-436634 233 42 ( ( -LRB- 10_1101-436634 233 43 i.e. i.e. FW 10_1101-436634 234 1 each each DT 10_1101-436634 234 2 start start NN 10_1101-436634 234 3 or or CC 10_1101-436634 234 4 end end NN 10_1101-436634 234 5 of of IN 10_1101-436634 234 6 an an DT 10_1101-436634 234 7 exon exon NN 10_1101-436634 234 8 whose whose WP$ 10_1101-436634 234 9 splice splice NN 10_1101-436634 234 10 variant variant JJ 10_1101-436634 234 11 window window NN 10_1101-436634 234 12 included include VBD 10_1101-436634 234 13 the the DT 10_1101-436634 234 14 variant variant NN 10_1101-436634 234 15 ) ) -RRB- 10_1101-436634 234 16 defined define VBN 10_1101-436634 234 17 as as IN 10_1101-436634 234 18 min(distance_from_start_of_exon min(distance_from_start_of_exon NNP 10_1101-436634 234 19 , , , 10_1101-436634 234 20 distance_from_end_of_exon distance_from_end_of_exon NNP 10_1101-436634 234 21 ) ) -RRB- 10_1101-436634 234 22 , , , 10_1101-436634 234 23 and and CC 10_1101-436634 234 24 the the DT 10_1101-436634 234 25 variant variant JJ 10_1101-436634 234 26 type type NN 10_1101-436634 234 27 for for IN 10_1101-436634 234 28 each each DT 10_1101-436634 234 29 transcript transcript NN 10_1101-436634 234 30 . . . 10_1101-436634 235 1 Internally internally RB 10_1101-436634 235 2 , , , 10_1101-436634 235 3 this this DT 10_1101-436634 235 4 function function NN 10_1101-436634 235 5 relies rely VBZ 10_1101-436634 235 6 on on IN 10_1101-436634 235 7 HTSlib HTSlib NNP 10_1101-436634 235 8 to to TO 10_1101-436634 235 9 parse parse VB 10_1101-436634 235 10 the the DT 10_1101-436634 235 11 VCF VCF NNP 10_1101-436634 235 12 file file NN 10_1101-436634 235 13 and and CC 10_1101-436634 235 14 search search VB 10_1101-436634 235 15 for for IN 10_1101-436634 235 16 features feature NNS 10_1101-436634 235 17 in in IN 10_1101-436634 235 18 the the DT 10_1101-436634 235 19 GTF GTF NNP 10_1101-436634 235 20 file file NN 10_1101-436634 235 21 which which WDT 10_1101-436634 235 22 overlap overlap VBP 10_1101-436634 235 23 the the DT 10_1101-436634 235 24 variant variant NN 10_1101-436634 235 25 . . . 10_1101-436634 236 1 The the DT 10_1101-436634 236 2 splice splice NN 10_1101-436634 236 3 variant variant JJ 10_1101-436634 236 4 window window NN 10_1101-436634 236 5 size size NN 10_1101-436634 236 6 ( ( -LRB- 10_1101-436634 236 7 i.e. i.e. FW 10_1101-436634 237 1 the the DT 10_1101-436634 237 2 maximum maximum JJ 10_1101-436634 237 3 distance distance NN 10_1101-436634 237 4 from from IN 10_1101-436634 237 5 the the DT 10_1101-436634 237 6 edge edge NN 10_1101-436634 237 7 of of IN 10_1101-436634 237 8 an an DT 10_1101-436634 237 9 exon exon NN 10_1101-436634 237 10 used use VBN 10_1101-436634 237 11 to to TO 10_1101-436634 237 12 consider consider VB 10_1101-436634 237 13 a a DT 10_1101-436634 237 14 variant variant NN 10_1101-436634 237 15 as as IN 10_1101-436634 237 16 splicing splicing NN 10_1101-436634 237 17 - - HYPH 10_1101-436634 237 18 relevant relevant JJ 10_1101-436634 237 19 ) ) -RRB- 10_1101-436634 237 20 can can MD 10_1101-436634 237 21 be be VB 10_1101-436634 237 22 set set VBN 10_1101-436634 237 23 by by IN 10_1101-436634 237 24 the the DT 10_1101-436634 237 25 options option NNS 10_1101-436634 237 26 “ " `` 10_1101-436634 237 27 - - HYPH 10_1101-436634 237 28 e e NN 10_1101-436634 237 29 < < XX 10_1101-436634 237 30 number number NN 10_1101-436634 237 31 of of IN 10_1101-436634 237 32 bases basis NNS 10_1101-436634 237 33 > > XX 10_1101-436634 237 34 ” " '' 10_1101-436634 237 35 and and CC 10_1101-436634 237 36 “ " `` 10_1101-436634 237 37 -i -i : 10_1101-436634 237 38 < < XX 10_1101-436634 237 39 number number NN 10_1101-436634 237 40 of of IN 10_1101-436634 237 41 bases basis NNS 10_1101-436634 237 42 > > XX 10_1101-436634 237 43 ” " '' 10_1101-436634 237 44 for for IN 10_1101-436634 237 45 exonic exonic JJ 10_1101-436634 237 46 and and CC 10_1101-436634 237 47 intronic intronic JJ 10_1101-436634 237 48 variants variant NNS 10_1101-436634 237 49 , , , 10_1101-436634 237 50 respectively respectively RB 10_1101-436634 237 51 . . . 10_1101-436634 238 1 The the DT 10_1101-436634 238 2 variant variant JJ 10_1101-436634 238 3 type type NN 10_1101-436634 238 4 for for IN 10_1101-436634 238 5 each each DT 10_1101-436634 238 6 variant variant NN 10_1101-436634 238 7 thus thus RB 10_1101-436634 238 8 depends depend VBZ 10_1101-436634 238 9 on on IN 10_1101-436634 238 10 the the DT 10_1101-436634 238 11 options option NNS 10_1101-436634 238 12 used use VBN 10_1101-436634 238 13 to to TO 10_1101-436634 238 14 set set VB 10_1101-436634 238 15 the the DT 10_1101-436634 238 16 splice splice NN 10_1101-436634 238 17 variant variant JJ 10_1101-436634 238 18 window window NN 10_1101-436634 238 19 size size NN 10_1101-436634 238 20 . . . 10_1101-436634 239 1 Variants variant NNS 10_1101-436634 239 2 captured capture VBN 10_1101-436634 239 3 by by IN 10_1101-436634 239 4 the the DT 10_1101-436634 239 5 window window NN 10_1101-436634 239 6 set set VBN 10_1101-436634 239 7 by by IN 10_1101-436634 239 8 “ " `` 10_1101-436634 239 9 -e -e : 10_1101-436634 239 10 ” " '' 10_1101-436634 239 11 or or CC 10_1101-436634 239 12 “ " `` 10_1101-436634 239 13 -i -i : 10_1101-436634 239 14 ” " '' 10_1101-436634 239 15 are be VBP 10_1101-436634 239 16 annotated annotate VBN 10_1101-436634 239 17 as as IN 10_1101-436634 239 18 .CC .CC NFP 10_1101-436634 239 19 - - HYPH 10_1101-436634 239 20 BY by IN 10_1101-436634 239 21 - - HYPH 10_1101-436634 239 22 NC NC NNP 10_1101-436634 239 23 - - HYPH 10_1101-436634 239 24 ND ND NNP 10_1101-436634 239 25 4.0 4.0 CD 10_1101-436634 239 26 International international JJ 10_1101-436634 239 27 licensea licensea NNS 10_1101-436634 239 28 certified certify VBN 10_1101-436634 239 29 by by IN 10_1101-436634 239 30 peer peer NN 10_1101-436634 239 31 review review NN 10_1101-436634 239 32 ) ) -RRB- 10_1101-436634 239 33 is be VBZ 10_1101-436634 239 34 the the DT 10_1101-436634 239 35 author author NN 10_1101-436634 239 36 / / SYM 10_1101-436634 239 37 funder funder NN 10_1101-436634 239 38 , , , 10_1101-436634 239 39 who who WP 10_1101-436634 239 40 has have VBZ 10_1101-436634 239 41 granted grant VBN 10_1101-436634 239 42 bioRxiv biorxiv IN 10_1101-436634 239 43 a a DT 10_1101-436634 239 44 license license NN 10_1101-436634 239 45 to to TO 10_1101-436634 239 46 display display VB 10_1101-436634 239 47 the the DT 10_1101-436634 239 48 preprint preprint NN 10_1101-436634 239 49 in in IN 10_1101-436634 239 50 perpetuity perpetuity NN 10_1101-436634 239 51 . . . 10_1101-436634 240 1 It -PRON- PRP 10_1101-436634 240 2 is be VBZ 10_1101-436634 240 3 made make VBN 10_1101-436634 240 4 available available JJ 10_1101-436634 240 5 under under IN 10_1101-436634 240 6 The the DT 10_1101-436634 240 7 copyright copyright NN 10_1101-436634 240 8 holder holder NN 10_1101-436634 240 9 for for IN 10_1101-436634 240 10 this this DT 10_1101-436634 240 11 preprint preprint NN 10_1101-436634 240 12 ( ( -LRB- 10_1101-436634 240 13 which which WDT 10_1101-436634 240 14 was be VBD 10_1101-436634 240 15 notthis notthis DT 10_1101-436634 240 16 version version NN 10_1101-436634 240 17 posted post VBN 10_1101-436634 240 18 January January NNP 10_1101-436634 240 19 5 5 CD 10_1101-436634 240 20 , , , 10_1101-436634 240 21 2021 2021 CD 10_1101-436634 240 22 . . . 10_1101-436634 240 23 ; ; : 10_1101-436634 240 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 240 25 : : : 10_1101-436634 240 26 bioRxiv biorxiv VB 10_1101-436634 240 27 preprint preprint NN 10_1101-436634 240 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 240 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 240 30 12 12 CD 10_1101-436634 240 31 “ " `` 10_1101-436634 240 32 splicing_exonic splicing_exonic JJ 10_1101-436634 240 33 ” " '' 10_1101-436634 240 34 and and CC 10_1101-436634 240 35 “ " `` 10_1101-436634 240 36 splicing_intronic splicing_intronic JJ 10_1101-436634 240 37 ” " '' 10_1101-436634 240 38 , , , 10_1101-436634 240 39 respectively respectively RB 10_1101-436634 240 40 . . . 10_1101-436634 241 1 Alternatively alternatively RB 10_1101-436634 241 2 , , , 10_1101-436634 241 3 to to TO 10_1101-436634 241 4 analyze analyze VB 10_1101-436634 241 5 all all DT 10_1101-436634 241 6 exonic exonic JJ 10_1101-436634 241 7 or or CC 10_1101-436634 241 8 intronic intronic JJ 10_1101-436634 241 9 variants variant NNS 10_1101-436634 241 10 , , , 10_1101-436634 241 11 the the DT 10_1101-436634 241 12 “ " `` 10_1101-436634 241 13 -E -E : 10_1101-436634 241 14 ” " '' 10_1101-436634 241 15 and and CC 10_1101-436634 241 16 “ " `` 10_1101-436634 241 17 -I -i FW 10_1101-436634 241 18 ” " '' 10_1101-436634 241 19 options option NNS 10_1101-436634 241 20 can can MD 10_1101-436634 241 21 be be VB 10_1101-436634 241 22 used use VBN 10_1101-436634 241 23 . . . 10_1101-436634 242 1 Otherwise otherwise RB 10_1101-436634 242 2 , , , 10_1101-436634 242 3 the the DT 10_1101-436634 242 4 “ " `` 10_1101-436634 242 5 -E -E : 10_1101-436634 242 6 ” " '' 10_1101-436634 242 7 and and CC 10_1101-436634 242 8 “ " `` 10_1101-436634 242 9 -I -i FW 10_1101-436634 242 10 ” " '' 10_1101-436634 242 11 options option NNS 10_1101-436634 242 12 themselves -PRON- PRP 10_1101-436634 242 13 do do VBP 10_1101-436634 242 14 not not RB 10_1101-436634 242 15 change change VB 10_1101-436634 242 16 the the DT 10_1101-436634 242 17 variant variant JJ 10_1101-436634 242 18 type type NN 10_1101-436634 242 19 annotation annotation NN 10_1101-436634 242 20 , , , 10_1101-436634 242 21 and and CC 10_1101-436634 242 22 variants variant NNS 10_1101-436634 242 23 found find VBN 10_1101-436634 242 24 in in IN 10_1101-436634 242 25 these these DT 10_1101-436634 242 26 windows window NNS 10_1101-436634 242 27 are be VBP 10_1101-436634 242 28 labeled label VBN 10_1101-436634 242 29 simply simply RB 10_1101-436634 242 30 as as IN 10_1101-436634 242 31 “ " `` 10_1101-436634 242 32 exonic exonic JJ 10_1101-436634 242 33 ” " '' 10_1101-436634 242 34 or or CC 10_1101-436634 242 35 “ " `` 10_1101-436634 242 36 intronic intronic JJ 10_1101-436634 242 37 ” " '' 10_1101-436634 242 38 . . . 10_1101-436634 243 1 By by IN 10_1101-436634 243 2 default default NN 10_1101-436634 243 3 , , , 10_1101-436634 243 4 single single JJ 10_1101-436634 243 5 exon exon NN 10_1101-436634 243 6 transcripts transcript NNS 10_1101-436634 243 7 are be VBP 10_1101-436634 243 8 ignored ignore VBN 10_1101-436634 243 9 , , , 10_1101-436634 243 10 but but CC 10_1101-436634 243 11 they -PRON- PRP 10_1101-436634 243 12 can can MD 10_1101-436634 243 13 be be VB 10_1101-436634 243 14 included include VBN 10_1101-436634 243 15 with with IN 10_1101-436634 243 16 the the DT 10_1101-436634 243 17 “ " `` 10_1101-436634 243 18 -S -S NNP 10_1101-436634 243 19 ” " '' 10_1101-436634 243 20 option option NN 10_1101-436634 243 21 . . . 10_1101-436634 244 1 By by IN 10_1101-436634 244 2 default default NN 10_1101-436634 244 3 , , , 10_1101-436634 244 4 output output NN 10_1101-436634 244 5 is be VBZ 10_1101-436634 244 6 written write VBN 10_1101-436634 244 7 to to IN 10_1101-436634 244 8 STDOUT STDOUT NNP 10_1101-436634 244 9 in in IN 10_1101-436634 244 10 VCF VCF NNP 10_1101-436634 244 11 format format NN 10_1101-436634 244 12 . . . 10_1101-436634 245 1 To to TO 10_1101-436634 245 2 write write VB 10_1101-436634 245 3 to to IN 10_1101-436634 245 4 a a DT 10_1101-436634 245 5 file file NN 10_1101-436634 245 6 , , , 10_1101-436634 245 7 use use VB 10_1101-436634 245 8 the the DT 10_1101-436634 245 9 option option NN 10_1101-436634 245 10 “ " `` 10_1101-436634 245 11 -o -o NFP 10_1101-436634 245 12 < < XX 10_1101-436634 245 13 PATH PATH NNP 10_1101-436634 245 14 / / SYM 10_1101-436634 245 15 TO TO NNP 10_1101-436634 245 16 / / SYM 10_1101-436634 245 17 FILE FILE NNP 10_1101-436634 245 18 > > XX 10_1101-436634 245 19 ” " '' 10_1101-436634 245 20 . . . 10_1101-436634 246 1 Junctions junction NNS 10_1101-436634 246 2 extract extract VBP 10_1101-436634 246 3 This this DT 10_1101-436634 246 4 command command NN 10_1101-436634 246 5 takes take VBZ 10_1101-436634 246 6 an an DT 10_1101-436634 246 7 alignment alignment NN 10_1101-436634 246 8 file file NN 10_1101-436634 246 9 containing contain VBG 10_1101-436634 246 10 aligned align VBN 10_1101-436634 246 11 RNA RNA NNP 10_1101-436634 246 12 - - HYPH 10_1101-436634 246 13 seq seq NN 10_1101-436634 246 14 reads read NNS 10_1101-436634 246 15 and and CC 10_1101-436634 246 16 infers infer VBZ 10_1101-436634 246 17 junctions junction NNS 10_1101-436634 246 18 ( ( -LRB- 10_1101-436634 246 19 i.e. i.e. FW 10_1101-436634 247 1 exon exon NNP 10_1101-436634 247 2 - - HYPH 10_1101-436634 247 3 exon exon NN 10_1101-436634 247 4 boundaries boundary NNS 10_1101-436634 247 5 ) ) -RRB- 10_1101-436634 247 6 based base VBN 10_1101-436634 247 7 on on IN 10_1101-436634 247 8 skipped skipped JJ 10_1101-436634 247 9 regions region NNS 10_1101-436634 247 10 in in IN 10_1101-436634 247 11 alignments alignment NNS 10_1101-436634 247 12 as as IN 10_1101-436634 247 13 determined determine VBN 10_1101-436634 247 14 by by IN 10_1101-436634 247 15 the the DT 10_1101-436634 247 16 CIGAR CIGAR NNP 10_1101-436634 247 17 string string NN 10_1101-436634 247 18 operator operator NN 10_1101-436634 247 19 codes code NNS 10_1101-436634 247 20 . . . 10_1101-436634 248 1 These these DT 10_1101-436634 248 2 junctions junction NNS 10_1101-436634 248 3 are be VBP 10_1101-436634 248 4 written write VBN 10_1101-436634 248 5 to to IN 10_1101-436634 248 6 STDOUT STDOUT NNP 10_1101-436634 248 7 in in IN 10_1101-436634 248 8 BED12 BED12 NNP 10_1101-436634 248 9 format format NN 10_1101-436634 248 10 . . . 10_1101-436634 249 1 Alternatively alternatively RB 10_1101-436634 249 2 , , , 10_1101-436634 249 3 the the DT 10_1101-436634 249 4 output output NN 10_1101-436634 249 5 can can MD 10_1101-436634 249 6 be be VB 10_1101-436634 249 7 redirected redirect VBN 10_1101-436634 249 8 to to IN 10_1101-436634 249 9 a a DT 10_1101-436634 249 10 file file NN 10_1101-436634 249 11 with with IN 10_1101-436634 249 12 the the DT 10_1101-436634 249 13 “ " `` 10_1101-436634 249 14 -o -o NFP 10_1101-436634 249 15 < < XX 10_1101-436634 249 16 PATH PATH NNP 10_1101-436634 249 17 / / SYM 10_1101-436634 249 18 TO TO NNP 10_1101-436634 249 19 / / SYM 10_1101-436634 249 20 FILE FILE NNP 10_1101-436634 249 21 > > XX 10_1101-436634 249 22 ” " '' 10_1101-436634 249 23 . . . 10_1101-436634 250 1 RegTools RegTools NNP 10_1101-436634 250 2 ascertains ascertain NNS 10_1101-436634 250 3 strand strand NN 10_1101-436634 250 4 information information NN 10_1101-436634 250 5 based base VBN 10_1101-436634 250 6 on on IN 10_1101-436634 250 7 the the DT 10_1101-436634 250 8 XS XS NNP 10_1101-436634 250 9 tags tag NNS 10_1101-436634 250 10 set set VBN 10_1101-436634 250 11 by by IN 10_1101-436634 250 12 the the DT 10_1101-436634 250 13 aligner aligner NN 10_1101-436634 250 14 , , , 10_1101-436634 250 15 but but CC 10_1101-436634 250 16 can can MD 10_1101-436634 250 17 also also RB 10_1101-436634 250 18 determine determine VB 10_1101-436634 250 19 the the DT 10_1101-436634 250 20 inferred inferred JJ 10_1101-436634 250 21 strand strand NN 10_1101-436634 250 22 of of IN 10_1101-436634 250 23 transcription transcription NN 10_1101-436634 250 24 based base VBN 10_1101-436634 250 25 on on IN 10_1101-436634 250 26 the the DT 10_1101-436634 250 27 BAM BAM NNP 10_1101-436634 250 28 flags flag NNS 10_1101-436634 250 29 if if IN 10_1101-436634 250 30 a a DT 10_1101-436634 250 31 stranded stranded JJ 10_1101-436634 250 32 library library NN 10_1101-436634 250 33 strategy strategy NN 10_1101-436634 250 34 was be VBD 10_1101-436634 250 35 employed employ VBN 10_1101-436634 250 36 . . . 10_1101-436634 251 1 In in IN 10_1101-436634 251 2 the the DT 10_1101-436634 251 3 latter latter JJ 10_1101-436634 251 4 case case NN 10_1101-436634 251 5 , , , 10_1101-436634 251 6 the the DT 10_1101-436634 251 7 strand strand NNP 10_1101-436634 251 8 specificity specificity NN 10_1101-436634 251 9 of of IN 10_1101-436634 251 10 the the DT 10_1101-436634 251 11 library library NN 10_1101-436634 251 12 can can MD 10_1101-436634 251 13 be be VB 10_1101-436634 251 14 provided provide VBN 10_1101-436634 251 15 using use VBG 10_1101-436634 251 16 “ " `` 10_1101-436634 251 17 -s -s NFP 10_1101-436634 251 18 < < XX 10_1101-436634 251 19 INT INT NNP 10_1101-436634 251 20 > > XX 10_1101-436634 251 21 ” " '' 10_1101-436634 251 22 where where WRB 10_1101-436634 251 23 0 0 CD 10_1101-436634 251 24 = = SYM 10_1101-436634 251 25 unstranded unstranded JJ 10_1101-436634 251 26 , , , 10_1101-436634 251 27 1 1 CD 10_1101-436634 251 28 = = SYM 10_1101-436634 251 29 first first JJ 10_1101-436634 251 30 - - HYPH 10_1101-436634 251 31 strand strand NNP 10_1101-436634 251 32 / / SYM 10_1101-436634 251 33 RF RF NNP 10_1101-436634 251 34 , , , 10_1101-436634 251 35 2 2 CD 10_1101-436634 251 36 = = SYM 10_1101-436634 251 37 second second JJ 10_1101-436634 251 38 - - HYPH 10_1101-436634 251 39 strand strand NNP 10_1101-436634 251 40 / / SYM 10_1101-436634 251 41 FR FR NNP 10_1101-436634 251 42 . . . 10_1101-436634 252 1 We -PRON- PRP 10_1101-436634 252 2 suggest suggest VBP 10_1101-436634 252 3 that that IN 10_1101-436634 252 4 users user NNS 10_1101-436634 252 5 align align VBP 10_1101-436634 252 6 their -PRON- PRP$ 10_1101-436634 252 7 RNA RNA NNP 10_1101-436634 252 8 - - HYPH 10_1101-436634 252 9 seq seq NN 10_1101-436634 252 10 data datum NNS 10_1101-436634 252 11 with with IN 10_1101-436634 252 12 HISAT230 HISAT230 NNP 10_1101-436634 252 13 , , , 10_1101-436634 252 14 TopHat231 TopHat231 NNP 10_1101-436634 252 15 , , , 10_1101-436634 252 16 or or CC 10_1101-436634 252 17 STAR29 STAR29 NNP 10_1101-436634 252 18 , , , 10_1101-436634 252 19 as as IN 10_1101-436634 252 20 these these DT 10_1101-436634 252 21 are be VBP 10_1101-436634 252 22 the the DT 10_1101-436634 252 23 aligners aligner NNS 10_1101-436634 252 24 we -PRON- PRP 10_1101-436634 252 25 have have VBP 10_1101-436634 252 26 tested test VBN 10_1101-436634 252 27 to to IN 10_1101-436634 252 28 date date NN 10_1101-436634 252 29 . . . 10_1101-436634 253 1 If if IN 10_1101-436634 253 2 RNA RNA NNP 10_1101-436634 253 3 - - HYPH 10_1101-436634 253 4 seq seq NN 10_1101-436634 253 5 data datum NNS 10_1101-436634 253 6 is be VBZ 10_1101-436634 253 7 unstranded unstranded JJ 10_1101-436634 253 8 and and CC 10_1101-436634 253 9 aligned align VBN 10_1101-436634 253 10 with with IN 10_1101-436634 253 11 STAR STAR NNP 10_1101-436634 253 12 , , , 10_1101-436634 253 13 users user NNS 10_1101-436634 253 14 must must MD 10_1101-436634 253 15 run run VB 10_1101-436634 253 16 STAR STAR NNP 10_1101-436634 253 17 with with IN 10_1101-436634 253 18 the the DT 10_1101-436634 253 19 --outSAMattributes --outsamattributes NN 10_1101-436634 253 20 option option NN 10_1101-436634 253 21 to to TO 10_1101-436634 253 22 include include VB 10_1101-436634 253 23 XS XS NNP 10_1101-436634 253 24 tags tag NNS 10_1101-436634 253 25 in in IN 10_1101-436634 253 26 the the DT 10_1101-436634 253 27 BAM BAM NNP 10_1101-436634 253 28 output output NN 10_1101-436634 253 29 . . . 10_1101-436634 254 1 Users user NNS 10_1101-436634 254 2 can can MD 10_1101-436634 254 3 set set VB 10_1101-436634 254 4 thresholds threshold NNS 10_1101-436634 254 5 for for IN 10_1101-436634 254 6 minimum minimum JJ 10_1101-436634 254 7 anchor anchor NN 10_1101-436634 254 8 length length NN 10_1101-436634 254 9 and and CC 10_1101-436634 254 10 minimum minimum NN 10_1101-436634 254 11 / / SYM 10_1101-436634 254 12 maximum maximum JJ 10_1101-436634 254 13 intron intron NNP 10_1101-436634 254 14 length length NN 10_1101-436634 254 15 . . . 10_1101-436634 255 1 The the DT 10_1101-436634 255 2 minimum minimum JJ 10_1101-436634 255 3 anchor anchor NN 10_1101-436634 255 4 length length NN 10_1101-436634 255 5 determines determine VBZ 10_1101-436634 255 6 how how WRB 10_1101-436634 255 7 many many JJ 10_1101-436634 255 8 contiguous contiguous JJ 10_1101-436634 255 9 , , , 10_1101-436634 255 10 matched match VBD 10_1101-436634 255 11 base base NN 10_1101-436634 255 12 pairs pair NNS 10_1101-436634 255 13 on on IN 10_1101-436634 255 14 either either DT 10_1101-436634 255 15 side side NN 10_1101-436634 255 16 of of IN 10_1101-436634 255 17 the the DT 10_1101-436634 255 18 junction junction NN 10_1101-436634 255 19 are be VBP 10_1101-436634 255 20 required require VBN 10_1101-436634 255 21 to to TO 10_1101-436634 255 22 include include VB 10_1101-436634 255 23 it -PRON- PRP 10_1101-436634 255 24 in in IN 10_1101-436634 255 25 the the DT 10_1101-436634 255 26 final final JJ 10_1101-436634 255 27 output output NN 10_1101-436634 255 28 . . . 10_1101-436634 256 1 The the DT 10_1101-436634 256 2 required require VBN 10_1101-436634 256 3 overlap overlap NN 10_1101-436634 256 4 can can MD 10_1101-436634 256 5 be be VB 10_1101-436634 256 6 observed observe VBN 10_1101-436634 256 7 amongst amongst IN 10_1101-436634 256 8 separated separated JJ 10_1101-436634 256 9 reads read NNS 10_1101-436634 256 10 , , , 10_1101-436634 256 11 whose whose WP$ 10_1101-436634 256 12 union union NN 10_1101-436634 256 13 determines determine VBZ 10_1101-436634 256 14 the the DT 10_1101-436634 256 15 thickStart thickstart NN 10_1101-436634 256 16 and and CC 10_1101-436634 256 17 thickEnd thickEnd NNS 10_1101-436634 256 18 of of IN 10_1101-436634 256 19 the the DT 10_1101-436634 256 20 BED BED NNP 10_1101-436634 256 21 feature feature NN 10_1101-436634 256 22 . . . 10_1101-436634 257 1 By by IN 10_1101-436634 257 2 default default NN 10_1101-436634 257 3 , , , 10_1101-436634 257 4 a a DT 10_1101-436634 257 5 junction junction NN 10_1101-436634 257 6 must must MD 10_1101-436634 257 7 have have VB 10_1101-436634 257 8 8 8 CD 10_1101-436634 257 9 bp bp NN 10_1101-436634 257 10 anchors anchor NNS 10_1101-436634 257 11 on on IN 10_1101-436634 257 12 each each DT 10_1101-436634 257 13 side side NN 10_1101-436634 257 14 to to TO 10_1101-436634 257 15 be be VB 10_1101-436634 257 16 counted count VBN 10_1101-436634 257 17 but but CC 10_1101-436634 257 18 this this DT 10_1101-436634 257 19 can can MD 10_1101-436634 257 20 be be VB 10_1101-436634 257 21 set set VBN 10_1101-436634 257 22 using use VBG 10_1101-436634 257 23 the the DT 10_1101-436634 257 24 option option NN 10_1101-436634 257 25 “ " `` 10_1101-436634 257 26 -a -a : 10_1101-436634 257 27 < < XX 10_1101-436634 257 28 minimum minimum JJ 10_1101-436634 257 29 anchor anchor NN 10_1101-436634 257 30 length length NN 10_1101-436634 257 31 > > XX 10_1101-436634 257 32 ” " '' 10_1101-436634 257 33 . . . 10_1101-436634 258 1 The the DT 10_1101-436634 258 2 intron intron NN 10_1101-436634 258 3 length length NN 10_1101-436634 258 4 is be VBZ 10_1101-436634 258 5 simply simply RB 10_1101-436634 258 6 the the DT 10_1101-436634 258 7 end end NN 10_1101-436634 258 8 coordinate coordinate NN 10_1101-436634 258 9 of of IN 10_1101-436634 258 10 the the DT 10_1101-436634 258 11 junction junction NN 10_1101-436634 258 12 minus minus CC 10_1101-436634 258 13 the the DT 10_1101-436634 258 14 start start NN 10_1101-436634 258 15 coordinate coordinate JJ 10_1101-436634 258 16 . . . 10_1101-436634 259 1 By by IN 10_1101-436634 259 2 default default NN 10_1101-436634 259 3 , , , 10_1101-436634 259 4 the the DT 10_1101-436634 259 5 junction junction NN 10_1101-436634 259 6 must must MD 10_1101-436634 259 7 be be VB 10_1101-436634 259 8 between between IN 10_1101-436634 259 9 70 70 CD 10_1101-436634 259 10 bp bp NN 10_1101-436634 259 11 and and CC 10_1101-436634 259 12 500,000 500,000 CD 10_1101-436634 259 13 bp bp NN 10_1101-436634 259 14 , , , 10_1101-436634 259 15 but but CC 10_1101-436634 259 16 the the DT 10_1101-436634 259 17 minimum minimum NN 10_1101-436634 259 18 and and CC 10_1101-436634 259 19 maximum maximum NN 10_1101-436634 259 20 can can MD 10_1101-436634 259 21 be be VB 10_1101-436634 259 22 set set VBN 10_1101-436634 259 23 using use VBG 10_1101-436634 259 24 “ " `` 10_1101-436634 259 25 -i -i : 10_1101-436634 259 26 < < XX 10_1101-436634 259 27 minimum minimum JJ 10_1101-436634 259 28 intron intron NN 10_1101-436634 259 29 length length NN 10_1101-436634 259 30 > > XX 10_1101-436634 259 31 ” " '' 10_1101-436634 259 32 and and CC 10_1101-436634 259 33 “ " `` 10_1101-436634 259 34 -I -i ADD 10_1101-436634 259 35 < < XX 10_1101-436634 259 36 maximum maximum JJ 10_1101-436634 259 37 intron intron NNP 10_1101-436634 259 38 length length NNP 10_1101-436634 259 39 > > XX 10_1101-436634 259 40 ” " '' 10_1101-436634 259 41 , , , 10_1101-436634 259 42 respectively respectively RB 10_1101-436634 259 43 . . . 10_1101-436634 260 1 For for IN 10_1101-436634 260 2 efficiency efficiency NN 10_1101-436634 260 3 , , , 10_1101-436634 260 4 this this DT 10_1101-436634 260 5 tool tool NN 10_1101-436634 260 6 can can MD 10_1101-436634 260 7 be be VB 10_1101-436634 260 8 used use VBN 10_1101-436634 260 9 to to TO 10_1101-436634 260 10 process process VB 10_1101-436634 260 11 only only RB 10_1101-436634 260 12 alignments alignment VBZ 10_1101-436634 260 13 in in IN 10_1101-436634 260 14 a a DT 10_1101-436634 260 15 particular particular JJ 10_1101-436634 260 16 region region NN 10_1101-436634 260 17 as as IN 10_1101-436634 260 18 opposed oppose VBN 10_1101-436634 260 19 to to IN 10_1101-436634 260 20 analyzing analyze VBG 10_1101-436634 260 21 the the DT 10_1101-436634 260 22 entire entire JJ 10_1101-436634 260 23 BAM bam NN 10_1101-436634 260 24 file file NN 10_1101-436634 260 25 . . . 10_1101-436634 261 1 The the DT 10_1101-436634 261 2 option option NN 10_1101-436634 261 3 “ " `` 10_1101-436634 261 4 -r -r XX 10_1101-436634 261 5 < < XX 10_1101-436634 261 6 chr>:-:- > XX 10_1101-436634 261 8 ” " `` 10_1101-436634 261 9 can can MD 10_1101-436634 261 10 be be VB 10_1101-436634 261 11 used use VBN 10_1101-436634 261 12 to to TO 10_1101-436634 261 13 set set VB 10_1101-436634 261 14 a a DT 10_1101-436634 261 15 single single JJ 10_1101-436634 261 16 contiguous contiguous JJ 10_1101-436634 261 17 region region NN 10_1101-436634 261 18 of of IN 10_1101-436634 261 19 interest interest NN 10_1101-436634 261 20 . . . 10_1101-436634 262 1 Multiple multiple JJ 10_1101-436634 262 2 jobs job NNS 10_1101-436634 262 3 can can MD 10_1101-436634 262 4 be be VB 10_1101-436634 262 5 run run VBN 10_1101-436634 262 6 in in IN 10_1101-436634 262 7 parallel parallel NN 10_1101-436634 262 8 to to TO 10_1101-436634 262 9 analyze analyze VB 10_1101-436634 262 10 separate separate JJ 10_1101-436634 262 11 non non JJ 10_1101-436634 262 12 - - JJ 10_1101-436634 262 13 contiguous contiguous JJ 10_1101-436634 262 14 regions region NNS 10_1101-436634 262 15 . . . 10_1101-436634 263 1 Junctions junction NNS 10_1101-436634 263 2 annotate annotate VBP 10_1101-436634 263 3 This this DT 10_1101-436634 263 4 command command NN 10_1101-436634 263 5 takes take VBZ 10_1101-436634 263 6 a a DT 10_1101-436634 263 7 list list NN 10_1101-436634 263 8 of of IN 10_1101-436634 263 9 junctions junction NNS 10_1101-436634 263 10 in in IN 10_1101-436634 263 11 BED12 BED12 NNP 10_1101-436634 263 12 format format NN 10_1101-436634 263 13 as as IN 10_1101-436634 263 14 input input NN 10_1101-436634 263 15 and and CC 10_1101-436634 263 16 annotates annotate VBZ 10_1101-436634 263 17 them -PRON- PRP 10_1101-436634 263 18 with with IN 10_1101-436634 263 19 respect respect NN 10_1101-436634 263 20 to to IN 10_1101-436634 263 21 a a DT 10_1101-436634 263 22 reference reference NN 10_1101-436634 263 23 transcriptome transcriptome VBN 10_1101-436634 263 24 in in IN 10_1101-436634 263 25 GTF GTF NNP 10_1101-436634 263 26 format format NN 10_1101-436634 263 27 . . . 10_1101-436634 264 1 The the DT 10_1101-436634 264 2 observed observed JJ 10_1101-436634 264 3 splice splice NN 10_1101-436634 264 4 - - HYPH 10_1101-436634 264 5 sites site NNS 10_1101-436634 264 6 used use VBN 10_1101-436634 264 7 are be VBP 10_1101-436634 264 8 recorded record VBN 10_1101-436634 264 9 based base VBN 10_1101-436634 264 10 on on IN 10_1101-436634 264 11 a a DT 10_1101-436634 264 12 reference reference NN 10_1101-436634 264 13 genome genome JJ 10_1101-436634 264 14 sequence sequence NN 10_1101-436634 264 15 in in IN 10_1101-436634 264 16 FASTA FASTA NNP 10_1101-436634 264 17 format format NN 10_1101-436634 264 18 . . . 10_1101-436634 265 1 The the DT 10_1101-436634 265 2 output output NN 10_1101-436634 265 3 is be VBZ 10_1101-436634 265 4 written write VBN 10_1101-436634 265 5 to to IN 10_1101-436634 265 6 STDOUT STDOUT NNP 10_1101-436634 265 7 in in IN 10_1101-436634 265 8 TSV TSV NNP 10_1101-436634 265 9 format format NN 10_1101-436634 265 10 , , , 10_1101-436634 265 11 with with IN 10_1101-436634 265 12 separate separate JJ 10_1101-436634 265 13 columns column NNS 10_1101-436634 265 14 for for IN 10_1101-436634 265 15 the the DT 10_1101-436634 265 16 number number NN 10_1101-436634 265 17 of of IN 10_1101-436634 265 18 splicing splicing NN 10_1101-436634 265 19 acceptors acceptor NNS 10_1101-436634 265 20 skipped skip VBD 10_1101-436634 265 21 , , , 10_1101-436634 265 22 number number NN 10_1101-436634 265 23 of of IN 10_1101-436634 265 24 splicing splicing NN 10_1101-436634 265 25 donors donor NNS 10_1101-436634 265 26 skipped skip VBD 10_1101-436634 265 27 , , , 10_1101-436634 265 28 number number NN 10_1101-436634 265 29 of of IN 10_1101-436634 265 30 exons exon NNS 10_1101-436634 265 31 skipped skip VBN 10_1101-436634 265 32 , , , 10_1101-436634 265 33 the the DT 10_1101-436634 265 34 junction junction NN 10_1101-436634 265 35 type type NN 10_1101-436634 265 36 , , , 10_1101-436634 265 37 whether whether IN 10_1101-436634 265 38 the the DT 10_1101-436634 265 39 donor donor NN 10_1101-436634 265 40 site site NN 10_1101-436634 265 41 is be VBZ 10_1101-436634 265 42 known know VBN 10_1101-436634 265 43 , , , 10_1101-436634 265 44 whether whether IN 10_1101-436634 265 45 the the DT 10_1101-436634 265 46 acceptor acceptor NN 10_1101-436634 265 47 site site NN 10_1101-436634 265 48 is be VBZ 10_1101-436634 265 49 known know VBN 10_1101-436634 265 50 , , , 10_1101-436634 265 51 whether whether IN 10_1101-436634 265 52 this this DT 10_1101-436634 265 53 junction junction NN 10_1101-436634 265 54 is be VBZ 10_1101-436634 265 55 known know VBN 10_1101-436634 265 56 , , , 10_1101-436634 265 57 the the DT 10_1101-436634 265 58 overlapping overlapping NN 10_1101-436634 265 59 transcripts transcript NNS 10_1101-436634 265 60 , , , 10_1101-436634 265 61 and and CC 10_1101-436634 265 62 the the DT 10_1101-436634 265 63 overlapping overlap VBG 10_1101-436634 265 64 genes gene NNS 10_1101-436634 265 65 , , , 10_1101-436634 265 66 in in IN 10_1101-436634 265 67 addition addition NN 10_1101-436634 265 68 to to IN 10_1101-436634 265 69 the the DT 10_1101-436634 265 70 chromosome chromosome NN 10_1101-436634 265 71 , , , 10_1101-436634 265 72 start start VB 10_1101-436634 265 73 , , , 10_1101-436634 265 74 stop stop VB 10_1101-436634 265 75 , , , 10_1101-436634 265 76 junction junction NN 10_1101-436634 265 77 name name NN 10_1101-436634 265 78 , , , 10_1101-436634 265 79 junction junction NN 10_1101-436634 265 80 score score NN 10_1101-436634 265 81 , , , 10_1101-436634 265 82 and and CC 10_1101-436634 265 83 strand strand NNP 10_1101-436634 265 84 taken take VBN 10_1101-436634 265 85 from from IN 10_1101-436634 265 86 the the DT 10_1101-436634 265 87 input input NN 10_1101-436634 265 88 BED12 BED12 NNP 10_1101-436634 265 89 file file NN 10_1101-436634 265 90 . . . 10_1101-436634 266 1 This this DT 10_1101-436634 266 2 output output NN 10_1101-436634 266 3 can can MD 10_1101-436634 266 4 be be VB 10_1101-436634 266 5 .CC .CC : 10_1101-436634 266 6 - - : 10_1101-436634 266 7 BY by IN 10_1101-436634 266 8 - - HYPH 10_1101-436634 266 9 NC NC NNP 10_1101-436634 266 10 - - HYPH 10_1101-436634 266 11 ND ND NNP 10_1101-436634 266 12 4.0 4.0 CD 10_1101-436634 266 13 International international JJ 10_1101-436634 266 14 licensea licensea NNS 10_1101-436634 266 15 certified certify VBN 10_1101-436634 266 16 by by IN 10_1101-436634 266 17 peer peer NN 10_1101-436634 266 18 review review NN 10_1101-436634 266 19 ) ) -RRB- 10_1101-436634 266 20 is be VBZ 10_1101-436634 266 21 the the DT 10_1101-436634 266 22 author author NN 10_1101-436634 266 23 / / SYM 10_1101-436634 266 24 funder funder NN 10_1101-436634 266 25 , , , 10_1101-436634 266 26 who who WP 10_1101-436634 266 27 has have VBZ 10_1101-436634 266 28 granted grant VBN 10_1101-436634 266 29 bioRxiv biorxiv IN 10_1101-436634 266 30 a a DT 10_1101-436634 266 31 license license NN 10_1101-436634 266 32 to to TO 10_1101-436634 266 33 display display VB 10_1101-436634 266 34 the the DT 10_1101-436634 266 35 preprint preprint NN 10_1101-436634 266 36 in in IN 10_1101-436634 266 37 perpetuity perpetuity NN 10_1101-436634 266 38 . . . 10_1101-436634 267 1 It -PRON- PRP 10_1101-436634 267 2 is be VBZ 10_1101-436634 267 3 made make VBN 10_1101-436634 267 4 available available JJ 10_1101-436634 267 5 under under IN 10_1101-436634 267 6 The the DT 10_1101-436634 267 7 copyright copyright NN 10_1101-436634 267 8 holder holder NN 10_1101-436634 267 9 for for IN 10_1101-436634 267 10 this this DT 10_1101-436634 267 11 preprint preprint NN 10_1101-436634 267 12 ( ( -LRB- 10_1101-436634 267 13 which which WDT 10_1101-436634 267 14 was be VBD 10_1101-436634 267 15 notthis notthis DT 10_1101-436634 267 16 version version NN 10_1101-436634 267 17 posted post VBN 10_1101-436634 267 18 January January NNP 10_1101-436634 267 19 5 5 CD 10_1101-436634 267 20 , , , 10_1101-436634 267 21 2021 2021 CD 10_1101-436634 267 22 . . . 10_1101-436634 267 23 ; ; : 10_1101-436634 267 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 267 25 : : : 10_1101-436634 267 26 bioRxiv biorxiv VB 10_1101-436634 267 27 preprint preprint NN 10_1101-436634 267 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 267 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 267 30 13 13 CD 10_1101-436634 267 31 redirected redirect VBD 10_1101-436634 267 32 to to IN 10_1101-436634 267 33 a a DT 10_1101-436634 267 34 file file NN 10_1101-436634 267 35 with with IN 10_1101-436634 267 36 “ " `` 10_1101-436634 267 37 -o -o , 10_1101-436634 267 38 /PATH /PATH . 10_1101-436634 267 39 / / SYM 10_1101-436634 267 40 TO TO NNP 10_1101-436634 267 41 / / SYM 10_1101-436634 267 42 FILE FILE NNP 10_1101-436634 267 43 ” " '' 10_1101-436634 267 44 . . . 10_1101-436634 268 1 By by IN 10_1101-436634 268 2 default default NN 10_1101-436634 268 3 , , , 10_1101-436634 268 4 single single JJ 10_1101-436634 268 5 exon exon NN 10_1101-436634 268 6 transcripts transcript NNS 10_1101-436634 268 7 are be VBP 10_1101-436634 268 8 ignored ignore VBN 10_1101-436634 268 9 in in IN 10_1101-436634 268 10 the the DT 10_1101-436634 268 11 GTF gtf NN 10_1101-436634 268 12 but but CC 10_1101-436634 268 13 can can MD 10_1101-436634 268 14 be be VB 10_1101-436634 268 15 included include VBN 10_1101-436634 268 16 with with IN 10_1101-436634 268 17 the the DT 10_1101-436634 268 18 option option NN 10_1101-436634 268 19 “ " `` 10_1101-436634 268 20 -S -S NNP 10_1101-436634 268 21 ” " '' 10_1101-436634 268 22 . . . 10_1101-436634 269 1 Cis Cis NNP 10_1101-436634 269 2 - - HYPH 10_1101-436634 269 3 splice splice NN 10_1101-436634 269 4 - - HYPH 10_1101-436634 269 5 effects effect NNS 10_1101-436634 269 6 identify identify VBP 10_1101-436634 269 7 This this DT 10_1101-436634 269 8 command command NN 10_1101-436634 269 9 combines combine VBZ 10_1101-436634 269 10 the the DT 10_1101-436634 269 11 above above JJ 10_1101-436634 269 12 utilities utility NNS 10_1101-436634 269 13 into into IN 10_1101-436634 269 14 a a DT 10_1101-436634 269 15 pipeline pipeline NN 10_1101-436634 269 16 for for IN 10_1101-436634 269 17 identifying identify VBG 10_1101-436634 269 18 variants variant NNS 10_1101-436634 269 19 which which WDT 10_1101-436634 269 20 may may MD 10_1101-436634 269 21 cause cause VB 10_1101-436634 269 22 aberrant aberrant JJ 10_1101-436634 269 23 splicing splicing NN 10_1101-436634 269 24 events event NNS 10_1101-436634 269 25 by by IN 10_1101-436634 269 26 altering alter VBG 10_1101-436634 269 27 splicing splicing NN 10_1101-436634 269 28 motifs motif NNS 10_1101-436634 269 29 in in IN 10_1101-436634 269 30 cis cis NNP 10_1101-436634 269 31 . . . 10_1101-436634 270 1 As as IN 10_1101-436634 270 2 such such JJ 10_1101-436634 270 3 , , , 10_1101-436634 270 4 it -PRON- PRP 10_1101-436634 270 5 relies rely VBZ 10_1101-436634 270 6 on on IN 10_1101-436634 270 7 essentially essentially RB 10_1101-436634 270 8 the the DT 10_1101-436634 270 9 same same JJ 10_1101-436634 270 10 inputs input NNS 10_1101-436634 270 11 : : : 10_1101-436634 270 12 a a DT 10_1101-436634 270 13 gzipped gzipped JJ 10_1101-436634 270 14 and and CC 10_1101-436634 270 15 Tabix Tabix NNP 10_1101-436634 270 16 - - HYPH 10_1101-436634 270 17 indexed index VBN 10_1101-436634 270 18 VCF VCF NNP 10_1101-436634 270 19 file file NN 10_1101-436634 270 20 containing contain VBG 10_1101-436634 270 21 a a DT 10_1101-436634 270 22 list list NN 10_1101-436634 270 23 of of IN 10_1101-436634 270 24 variants variant NNS 10_1101-436634 270 25 , , , 10_1101-436634 270 26 an an DT 10_1101-436634 270 27 alignment alignment NN 10_1101-436634 270 28 file file NN 10_1101-436634 270 29 containing contain VBG 10_1101-436634 270 30 aligned align VBN 10_1101-436634 270 31 RNA RNA NNP 10_1101-436634 270 32 - - HYPH 10_1101-436634 270 33 seq seq NN 10_1101-436634 270 34 reads read NNS 10_1101-436634 270 35 , , , 10_1101-436634 270 36 a a DT 10_1101-436634 270 37 GTF gtf NN 10_1101-436634 270 38 file file NN 10_1101-436634 270 39 containing contain VBG 10_1101-436634 270 40 the the DT 10_1101-436634 270 41 reference reference NN 10_1101-436634 270 42 transcriptome transcriptome DT 10_1101-436634 270 43 of of IN 10_1101-436634 270 44 interest interest NN 10_1101-436634 270 45 , , , 10_1101-436634 270 46 and and CC 10_1101-436634 270 47 a a DT 10_1101-436634 270 48 FASTA FASTA NNP 10_1101-436634 270 49 file file NN 10_1101-436634 270 50 containing contain VBG 10_1101-436634 270 51 the the DT 10_1101-436634 270 52 reference reference NN 10_1101-436634 270 53 genome genome JJ 10_1101-436634 270 54 sequence sequence NN 10_1101-436634 270 55 of of IN 10_1101-436634 270 56 interest interest NN 10_1101-436634 270 57 . . . 10_1101-436634 271 1 First first RB 10_1101-436634 271 2 , , , 10_1101-436634 271 3 the the DT 10_1101-436634 271 4 list list NN 10_1101-436634 271 5 of of IN 10_1101-436634 271 6 variants variant NNS 10_1101-436634 271 7 is be VBZ 10_1101-436634 271 8 annotated annotate VBN 10_1101-436634 271 9 . . . 10_1101-436634 272 1 The the DT 10_1101-436634 272 2 splice splice NN 10_1101-436634 272 3 variant variant JJ 10_1101-436634 272 4 window window NN 10_1101-436634 272 5 size size NN 10_1101-436634 272 6 is be VBZ 10_1101-436634 272 7 set set VBN 10_1101-436634 272 8 using use VBG 10_1101-436634 272 9 the the DT 10_1101-436634 272 10 options option NNS 10_1101-436634 272 11 “ " `` 10_1101-436634 272 12 - - HYPH 10_1101-436634 272 13 e e NN 10_1101-436634 272 14 ” " '' 10_1101-436634 272 15 , , , 10_1101-436634 272 16 “ " `` 10_1101-436634 272 17 -i -i : 10_1101-436634 272 18 ” " '' 10_1101-436634 272 19 , , , 10_1101-436634 272 20 “ " `` 10_1101-436634 272 21 -E -E : 10_1101-436634 272 22 ” " '' 10_1101-436634 272 23 , , , 10_1101-436634 272 24 and and CC 10_1101-436634 272 25 “ " `` 10_1101-436634 272 26 -I -i FW 10_1101-436634 272 27 ” " '' 10_1101-436634 272 28 , , , 10_1101-436634 272 29 just just RB 10_1101-436634 272 30 as as IN 10_1101-436634 272 31 in in IN 10_1101-436634 272 32 variants variant NNS 10_1101-436634 272 33 annotate annotate JJ 10_1101-436634 272 34 . . . 10_1101-436634 273 1 The the DT 10_1101-436634 273 2 splice splice NN 10_1101-436634 273 3 junction junction NN 10_1101-436634 273 4 region region NN 10_1101-436634 273 5 size size NN 10_1101-436634 273 6 ( ( -LRB- 10_1101-436634 273 7 i.e. i.e. FW 10_1101-436634 274 1 the the DT 10_1101-436634 274 2 range range NN 10_1101-436634 274 3 around around IN 10_1101-436634 274 4 a a DT 10_1101-436634 274 5 particular particular JJ 10_1101-436634 274 6 variant variant NN 10_1101-436634 274 7 in in IN 10_1101-436634 274 8 which which WDT 10_1101-436634 274 9 an an DT 10_1101-436634 274 10 overlapping overlap VBG 10_1101-436634 274 11 junction junction NN 10_1101-436634 274 12 is be VBZ 10_1101-436634 274 13 associated associate VBN 10_1101-436634 274 14 with with IN 10_1101-436634 274 15 the the DT 10_1101-436634 274 16 variant variant NN 10_1101-436634 274 17 ) ) -RRB- 10_1101-436634 274 18 can can MD 10_1101-436634 274 19 be be VB 10_1101-436634 274 20 set set VBN 10_1101-436634 274 21 using use VBG 10_1101-436634 274 22 “ " `` 10_1101-436634 274 23 -w -w : 10_1101-436634 274 24 < < XX 10_1101-436634 274 25 splice splice NN 10_1101-436634 274 26 junction junction NN 10_1101-436634 274 27 region region NN 10_1101-436634 274 28 size size NN 10_1101-436634 274 29 > > XX 10_1101-436634 274 30 ” " '' 10_1101-436634 274 31 . . . 10_1101-436634 275 1 By by IN 10_1101-436634 275 2 default default NN 10_1101-436634 275 3 , , , 10_1101-436634 275 4 this this DT 10_1101-436634 275 5 range range NN 10_1101-436634 275 6 is be VBZ 10_1101-436634 275 7 not not RB 10_1101-436634 275 8 a a DT 10_1101-436634 275 9 particular particular JJ 10_1101-436634 275 10 number number NN 10_1101-436634 275 11 of of IN 10_1101-436634 275 12 bases basis NNS 10_1101-436634 275 13 but but CC 10_1101-436634 275 14 is be VBZ 10_1101-436634 275 15 calculated calculate VBN 10_1101-436634 275 16 individually individually RB 10_1101-436634 275 17 for for IN 10_1101-436634 275 18 each each DT 10_1101-436634 275 19 variant variant JJ 10_1101-436634 275 20 , , , 10_1101-436634 275 21 depending depend VBG 10_1101-436634 275 22 on on IN 10_1101-436634 275 23 the the DT 10_1101-436634 275 24 variant variant JJ 10_1101-436634 275 25 type type NN 10_1101-436634 275 26 annotation annotation NN 10_1101-436634 275 27 . . . 10_1101-436634 276 1 For for IN 10_1101-436634 276 2 “ " `` 10_1101-436634 276 3 splicing_exonic splicing_exonic JJ 10_1101-436634 276 4 ” " '' 10_1101-436634 276 5 , , , 10_1101-436634 276 6 “ " `` 10_1101-436634 276 7 splicing_intronic splicing_intronic JJ 10_1101-436634 276 8 ” " '' 10_1101-436634 276 9 , , , 10_1101-436634 276 10 and and CC 10_1101-436634 276 11 “ " `` 10_1101-436634 276 12 exonic exonic JJ 10_1101-436634 276 13 ” " '' 10_1101-436634 276 14 variants variant NNS 10_1101-436634 276 15 , , , 10_1101-436634 276 16 the the DT 10_1101-436634 276 17 region region NN 10_1101-436634 276 18 extends extend VBZ 10_1101-436634 276 19 from from IN 10_1101-436634 276 20 the the DT 10_1101-436634 276 21 3 3 CD 10_1101-436634 276 22 ’ ' '' 10_1101-436634 276 23 end end NN 10_1101-436634 276 24 of of IN 10_1101-436634 276 25 the the DT 10_1101-436634 276 26 exon exon NN 10_1101-436634 276 27 directly directly RB 10_1101-436634 276 28 upstream upstream JJ 10_1101-436634 276 29 of of IN 10_1101-436634 276 30 the the DT 10_1101-436634 276 31 variant variant NN 10_1101-436634 276 32 - - HYPH 10_1101-436634 276 33 associated associate VBN 10_1101-436634 276 34 exon exon NNP 10_1101-436634 276 35 to to IN 10_1101-436634 276 36 the the DT 10_1101-436634 276 37 5 5 CD 10_1101-436634 276 38 ’ ' '' 10_1101-436634 276 39 end end NN 10_1101-436634 276 40 of of IN 10_1101-436634 276 41 the the DT 10_1101-436634 276 42 exon exon NN 10_1101-436634 276 43 directly directly RB 10_1101-436634 276 44 downstream downstream JJ 10_1101-436634 276 45 of of IN 10_1101-436634 276 46 it -PRON- PRP 10_1101-436634 276 47 . . . 10_1101-436634 277 1 For for IN 10_1101-436634 277 2 “ " `` 10_1101-436634 277 3 intronic intronic JJ 10_1101-436634 277 4 ” " '' 10_1101-436634 277 5 variants variant NNS 10_1101-436634 277 6 , , , 10_1101-436634 277 7 the the DT 10_1101-436634 277 8 region region NN 10_1101-436634 277 9 is be VBZ 10_1101-436634 277 10 limited limit VBN 10_1101-436634 277 11 to to IN 10_1101-436634 277 12 the the DT 10_1101-436634 277 13 intron intron NN 10_1101-436634 277 14 containing contain VBG 10_1101-436634 277 15 the the DT 10_1101-436634 277 16 variant variant NN 10_1101-436634 277 17 . . . 10_1101-436634 278 1 Single single JJ 10_1101-436634 278 2 - - HYPH 10_1101-436634 278 3 exons exon NNS 10_1101-436634 278 4 can can MD 10_1101-436634 278 5 be be VB 10_1101-436634 278 6 kept keep VBN 10_1101-436634 278 7 with with IN 10_1101-436634 278 8 the the DT 10_1101-436634 278 9 “ " `` 10_1101-436634 278 10 -S -S NNP 10_1101-436634 278 11 ” " '' 10_1101-436634 278 12 option option NN 10_1101-436634 278 13 . . . 10_1101-436634 279 1 The the DT 10_1101-436634 279 2 annotated annotate VBN 10_1101-436634 279 3 list list NN 10_1101-436634 279 4 of of IN 10_1101-436634 279 5 variants variant NNS 10_1101-436634 279 6 in in IN 10_1101-436634 279 7 VCF VCF NNP 10_1101-436634 279 8 format format NN 10_1101-436634 279 9 ( ( -LRB- 10_1101-436634 279 10 analogous analogous JJ 10_1101-436634 279 11 to to IN 10_1101-436634 279 12 the the DT 10_1101-436634 279 13 output output NN 10_1101-436634 279 14 of of IN 10_1101-436634 279 15 variants variant NNS 10_1101-436634 279 16 annotate annotate JJ 10_1101-436634 279 17 ) ) -RRB- 10_1101-436634 279 18 can can MD 10_1101-436634 279 19 be be VB 10_1101-436634 279 20 written write VBN 10_1101-436634 279 21 to to IN 10_1101-436634 279 22 a a DT 10_1101-436634 279 23 file file NN 10_1101-436634 279 24 with with IN 10_1101-436634 279 25 “ " `` 10_1101-436634 279 26 -v -v -RRB- 10_1101-436634 279 27 /PATH /PATH . 10_1101-436634 279 28 / / SYM 10_1101-436634 279 29 TO TO NNP 10_1101-436634 279 30 / / SYM 10_1101-436634 279 31 FILE FILE NNP 10_1101-436634 279 32 ” " '' 10_1101-436634 279 33 . . . 10_1101-436634 280 1 The the DT 10_1101-436634 280 2 BAM bam NN 10_1101-436634 280 3 file file NN 10_1101-436634 280 4 is be VBZ 10_1101-436634 280 5 then then RB 10_1101-436634 280 6 processed process VBN 10_1101-436634 280 7 in in IN 10_1101-436634 280 8 the the DT 10_1101-436634 280 9 splice splice NN 10_1101-436634 280 10 junction junction NN 10_1101-436634 280 11 regions region NNS 10_1101-436634 280 12 to to TO 10_1101-436634 280 13 produce produce VB 10_1101-436634 280 14 the the DT 10_1101-436634 280 15 list list NN 10_1101-436634 280 16 of of IN 10_1101-436634 280 17 junctions junction NNS 10_1101-436634 280 18 . . . 10_1101-436634 281 1 A a DT 10_1101-436634 281 2 file file NN 10_1101-436634 281 3 containing contain VBG 10_1101-436634 281 4 these these DT 10_1101-436634 281 5 junctions junction NNS 10_1101-436634 281 6 in in IN 10_1101-436634 281 7 BED12 BED12 NNP 10_1101-436634 281 8 format format NN 10_1101-436634 281 9 ( ( -LRB- 10_1101-436634 281 10 analogous analogous JJ 10_1101-436634 281 11 to to IN 10_1101-436634 281 12 the the DT 10_1101-436634 281 13 output output NN 10_1101-436634 281 14 of of IN 10_1101-436634 281 15 junctions junction NNS 10_1101-436634 281 16 extract extract NN 10_1101-436634 281 17 ) ) -RRB- 10_1101-436634 281 18 can can MD 10_1101-436634 281 19 be be VB 10_1101-436634 281 20 written write VBN 10_1101-436634 281 21 using use VBG 10_1101-436634 281 22 “ " `` 10_1101-436634 281 23 -j -j : 10_1101-436634 281 24 /PATH /PATH . 10_1101-436634 281 25 / / SYM 10_1101-436634 281 26 TO TO NNP 10_1101-436634 281 27 / / SYM 10_1101-436634 281 28 FILE FILE NNP 10_1101-436634 281 29 ” " '' 10_1101-436634 281 30 . . . 10_1101-436634 282 1 The the DT 10_1101-436634 282 2 minimum minimum JJ 10_1101-436634 282 3 anchor anchor NN 10_1101-436634 282 4 length length NN 10_1101-436634 282 5 , , , 10_1101-436634 282 6 minimum minimum JJ 10_1101-436634 282 7 intron intron NN 10_1101-436634 282 8 length length NN 10_1101-436634 282 9 , , , 10_1101-436634 282 10 and and CC 10_1101-436634 282 11 maximum maximum JJ 10_1101-436634 282 12 intron intron NNP 10_1101-436634 282 13 length length NN 10_1101-436634 282 14 can can MD 10_1101-436634 282 15 be be VB 10_1101-436634 282 16 set set VBN 10_1101-436634 282 17 using use VBG 10_1101-436634 282 18 “ " `` 10_1101-436634 282 19 -a -a NN 10_1101-436634 282 20 ” " '' 10_1101-436634 282 21 , , , 10_1101-436634 282 22 “ " `` 10_1101-436634 282 23 -i -i : 10_1101-436634 282 24 ” " '' 10_1101-436634 282 25 , , , 10_1101-436634 282 26 and and CC 10_1101-436634 282 27 “ " `` 10_1101-436634 282 28 -I -i FW 10_1101-436634 282 29 ” " '' 10_1101-436634 282 30 options option NNS 10_1101-436634 282 31 , , , 10_1101-436634 282 32 just just RB 10_1101-436634 282 33 as as IN 10_1101-436634 282 34 in in IN 10_1101-436634 282 35 junctions junction NNS 10_1101-436634 282 36 extract extract NNP 10_1101-436634 282 37 . . . 10_1101-436634 283 1 The the DT 10_1101-436634 283 2 list list NN 10_1101-436634 283 3 of of IN 10_1101-436634 283 4 junctions junction NNS 10_1101-436634 283 5 produced produce VBN 10_1101-436634 283 6 by by IN 10_1101-436634 283 7 the the DT 10_1101-436634 283 8 preceding precede VBG 10_1101-436634 283 9 step step NN 10_1101-436634 283 10 is be VBZ 10_1101-436634 283 11 then then RB 10_1101-436634 283 12 annotated annotate VBN 10_1101-436634 283 13 with with IN 10_1101-436634 283 14 the the DT 10_1101-436634 283 15 information information NN 10_1101-436634 283 16 presented present VBN 10_1101-436634 283 17 in in IN 10_1101-436634 283 18 junctions junction NNS 10_1101-436634 283 19 annotate annotate JJ 10_1101-436634 283 20 . . . 10_1101-436634 284 1 Additionally additionally RB 10_1101-436634 284 2 , , , 10_1101-436634 284 3 each each DT 10_1101-436634 284 4 junction junction NN 10_1101-436634 284 5 is be VBZ 10_1101-436634 284 6 annotated annotate VBN 10_1101-436634 284 7 with with IN 10_1101-436634 284 8 a a DT 10_1101-436634 284 9 list list NN 10_1101-436634 284 10 of of IN 10_1101-436634 284 11 associated associated JJ 10_1101-436634 284 12 variants variant NNS 10_1101-436634 284 13 ( ( -LRB- 10_1101-436634 284 14 i.e. i.e. FW 10_1101-436634 285 1 variants variant NNS 10_1101-436634 285 2 whose whose WP$ 10_1101-436634 285 3 splice splice NN 10_1101-436634 285 4 junction junction NN 10_1101-436634 285 5 regions region NNS 10_1101-436634 285 6 overlapped overlap VBD 10_1101-436634 285 7 the the DT 10_1101-436634 285 8 junction junction NN 10_1101-436634 285 9 ) ) -RRB- 10_1101-436634 285 10 . . . 10_1101-436634 286 1 The the DT 10_1101-436634 286 2 final final JJ 10_1101-436634 286 3 output output NN 10_1101-436634 286 4 is be VBZ 10_1101-436634 286 5 written write VBN 10_1101-436634 286 6 to to IN 10_1101-436634 286 7 STDOUT STDOUT NNP 10_1101-436634 286 8 in in IN 10_1101-436634 286 9 TSV TSV NNP 10_1101-436634 286 10 format format NN 10_1101-436634 286 11 ( ( -LRB- 10_1101-436634 286 12 analogous analogous JJ 10_1101-436634 286 13 to to IN 10_1101-436634 286 14 the the DT 10_1101-436634 286 15 output output NN 10_1101-436634 286 16 of of IN 10_1101-436634 286 17 junctions junction NNS 10_1101-436634 286 18 annotate annotate JJ 10_1101-436634 286 19 ) ) -RRB- 10_1101-436634 286 20 or or CC 10_1101-436634 286 21 can can MD 10_1101-436634 286 22 be be VB 10_1101-436634 286 23 redirected redirect VBN 10_1101-436634 286 24 to to IN 10_1101-436634 286 25 a a DT 10_1101-436634 286 26 file file NN 10_1101-436634 286 27 with with IN 10_1101-436634 286 28 “ " `` 10_1101-436634 286 29 -o -o , 10_1101-436634 286 30 /PATH /PATH . 10_1101-436634 286 31 / / SYM 10_1101-436634 286 32 TO TO NNP 10_1101-436634 286 33 / / SYM 10_1101-436634 286 34 FILE FILE NNP 10_1101-436634 286 35 ” " '' 10_1101-436634 286 36 . . . 10_1101-436634 287 1 Cis Cis NNP 10_1101-436634 287 2 - - HYPH 10_1101-436634 287 3 splice splice NN 10_1101-436634 287 4 - - HYPH 10_1101-436634 287 5 effects effect NNS 10_1101-436634 287 6 associate associate NN 10_1101-436634 287 7 This this DT 10_1101-436634 287 8 command command NN 10_1101-436634 287 9 is be VBZ 10_1101-436634 287 10 similar similar JJ 10_1101-436634 287 11 to to IN 10_1101-436634 287 12 cis cis NN 10_1101-436634 287 13 - - HYPH 10_1101-436634 287 14 splice splice NN 10_1101-436634 287 15 - - HYPH 10_1101-436634 287 16 effects effect NNS 10_1101-436634 287 17 identify identify VBP 10_1101-436634 287 18 , , , 10_1101-436634 287 19 but but CC 10_1101-436634 287 20 takes take VBZ 10_1101-436634 287 21 the the DT 10_1101-436634 287 22 BED BED NNP 10_1101-436634 287 23 output output NN 10_1101-436634 287 24 of of IN 10_1101-436634 287 25 junctions junction NNS 10_1101-436634 287 26 extract extract VBP 10_1101-436634 287 27 in in IN 10_1101-436634 287 28 lieu lieu NN 10_1101-436634 287 29 of of IN 10_1101-436634 287 30 an an DT 10_1101-436634 287 31 alignment alignment NN 10_1101-436634 287 32 file file NN 10_1101-436634 287 33 with with IN 10_1101-436634 287 34 RNA RNA NNP 10_1101-436634 287 35 alignments alignment NNS 10_1101-436634 287 36 . . . 10_1101-436634 288 1 As as IN 10_1101-436634 288 2 with with IN 10_1101-436634 288 3 cis cis NN 10_1101-436634 288 4 - - HYPH 10_1101-436634 288 5 splice splice NN 10_1101-436634 288 6 - - HYPH 10_1101-436634 288 7 effects effect NNS 10_1101-436634 288 8 identify identify VBP 10_1101-436634 288 9 , , , 10_1101-436634 288 10 each each DT 10_1101-436634 288 11 junction junction NN 10_1101-436634 288 12 is be VBZ 10_1101-436634 288 13 annotated annotate VBN 10_1101-436634 288 14 with with IN 10_1101-436634 288 15 a a DT 10_1101-436634 288 16 list list NN 10_1101-436634 288 17 of of IN 10_1101-436634 288 18 associated associated JJ 10_1101-436634 288 19 variants variant NNS 10_1101-436634 288 20 ( ( -LRB- 10_1101-436634 288 21 i.e. i.e. FW 10_1101-436634 289 1 variants variant NNS 10_1101-436634 289 2 whose whose WP$ 10_1101-436634 289 3 splice splice NN 10_1101-436634 289 4 junction junction NN 10_1101-436634 289 5 regions region NNS 10_1101-436634 289 6 overlapped overlap VBD 10_1101-436634 289 7 the the DT 10_1101-436634 289 8 junction junction NN 10_1101-436634 289 9 ) ) -RRB- 10_1101-436634 289 10 . . . 10_1101-436634 290 1 The the DT 10_1101-436634 290 2 resulting result VBG 10_1101-436634 290 3 output output NN 10_1101-436634 290 4 is be VBZ 10_1101-436634 290 5 then then RB 10_1101-436634 290 6 the the DT 10_1101-436634 290 7 same same JJ 10_1101-436634 290 8 as as IN 10_1101-436634 290 9 cis cis NN 10_1101-436634 290 10 - - HYPH 10_1101-436634 290 11 splice splice NN 10_1101-436634 290 12 - - HYPH 10_1101-436634 290 13 effects effect NNS 10_1101-436634 290 14 identify identify VBP 10_1101-436634 290 15 , , , 10_1101-436634 290 16 but but CC 10_1101-436634 290 17 limited limit VBN 10_1101-436634 290 18 to to IN 10_1101-436634 290 19 the the DT 10_1101-436634 290 20 junctions junction NNS 10_1101-436634 290 21 provided provide VBN 10_1101-436634 290 22 as as IN 10_1101-436634 290 23 input input NN 10_1101-436634 290 24 . . . 10_1101-436634 291 1 .CC .CC NFP 10_1101-436634 291 2 - - : 10_1101-436634 291 3 BY by IN 10_1101-436634 291 4 - - HYPH 10_1101-436634 291 5 NC NC NNP 10_1101-436634 291 6 - - HYPH 10_1101-436634 291 7 ND ND NNP 10_1101-436634 291 8 4.0 4.0 CD 10_1101-436634 291 9 International international JJ 10_1101-436634 291 10 licensea licensea NNS 10_1101-436634 291 11 certified certify VBN 10_1101-436634 291 12 by by IN 10_1101-436634 291 13 peer peer NN 10_1101-436634 291 14 review review NN 10_1101-436634 291 15 ) ) -RRB- 10_1101-436634 291 16 is be VBZ 10_1101-436634 291 17 the the DT 10_1101-436634 291 18 author author NN 10_1101-436634 291 19 / / SYM 10_1101-436634 291 20 funder funder NN 10_1101-436634 291 21 , , , 10_1101-436634 291 22 who who WP 10_1101-436634 291 23 has have VBZ 10_1101-436634 291 24 granted grant VBN 10_1101-436634 291 25 bioRxiv biorxiv IN 10_1101-436634 291 26 a a DT 10_1101-436634 291 27 license license NN 10_1101-436634 291 28 to to TO 10_1101-436634 291 29 display display VB 10_1101-436634 291 30 the the DT 10_1101-436634 291 31 preprint preprint NN 10_1101-436634 291 32 in in IN 10_1101-436634 291 33 perpetuity perpetuity NN 10_1101-436634 291 34 . . . 10_1101-436634 292 1 It -PRON- PRP 10_1101-436634 292 2 is be VBZ 10_1101-436634 292 3 made make VBN 10_1101-436634 292 4 available available JJ 10_1101-436634 292 5 under under IN 10_1101-436634 292 6 The the DT 10_1101-436634 292 7 copyright copyright NN 10_1101-436634 292 8 holder holder NN 10_1101-436634 292 9 for for IN 10_1101-436634 292 10 this this DT 10_1101-436634 292 11 preprint preprint NN 10_1101-436634 292 12 ( ( -LRB- 10_1101-436634 292 13 which which WDT 10_1101-436634 292 14 was be VBD 10_1101-436634 292 15 notthis notthis DT 10_1101-436634 292 16 version version NN 10_1101-436634 292 17 posted post VBN 10_1101-436634 292 18 January January NNP 10_1101-436634 292 19 5 5 CD 10_1101-436634 292 20 , , , 10_1101-436634 292 21 2021 2021 CD 10_1101-436634 292 22 . . . 10_1101-436634 292 23 ; ; : 10_1101-436634 292 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 292 25 : : : 10_1101-436634 292 26 bioRxiv biorxiv VB 10_1101-436634 292 27 preprint preprint NN 10_1101-436634 292 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 292 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB 10_1101-436634 292 30 14 14 CD 10_1101-436634 292 31 Analysis Analysis NNP 10_1101-436634 292 32 Dataset Dataset NNP 10_1101-436634 292 33 Description Description NNP 10_1101-436634 292 34 32 32 CD 10_1101-436634 292 35 cancer cancer NN 10_1101-436634 292 36 cohorts cohort NNS 10_1101-436634 292 37 were be VBD 10_1101-436634 292 38 analyzed analyze VBN 10_1101-436634 292 39 from from IN 10_1101-436634 292 40 TCGA TCGA NNP 10_1101-436634 292 41 . . . 10_1101-436634 293 1 These these DT 10_1101-436634 293 2 cancer cancer NN 10_1101-436634 293 3 types type NNS 10_1101-436634 293 4 are be VBP 10_1101-436634 293 5 Adrenocortical adrenocortical JJ 10_1101-436634 293 6 carcinoma carcinoma NN 10_1101-436634 293 7 ( ( -LRB- 10_1101-436634 293 8 ACC ACC NNP 10_1101-436634 293 9 ) ) -RRB- 10_1101-436634 293 10 , , , 10_1101-436634 293 11 Bladder Bladder NNP 10_1101-436634 293 12 Urothelial Urothelial NNP 10_1101-436634 293 13 Carcinoma Carcinoma NNP 10_1101-436634 293 14 ( ( -LRB- 10_1101-436634 293 15 BLCA BLCA NNP 10_1101-436634 293 16 ) ) -RRB- 10_1101-436634 293 17 , , , 10_1101-436634 293 18 Brain Brain NNP 10_1101-436634 293 19 Lower low JJR 10_1101-436634 293 20 Grade Grade NNP 10_1101-436634 293 21 Glioma Glioma NNP 10_1101-436634 293 22 ( ( -LRB- 10_1101-436634 293 23 LGG LGG NNP 10_1101-436634 293 24 ) ) -RRB- 10_1101-436634 293 25 , , , 10_1101-436634 293 26 Breast breast NN 10_1101-436634 293 27 invasive invasive JJ 10_1101-436634 293 28 carcinoma carcinoma NN 10_1101-436634 293 29 ( ( -LRB- 10_1101-436634 293 30 BRCA BRCA NNP 10_1101-436634 293 31 ) ) -RRB- 10_1101-436634 293 32 , , , 10_1101-436634 293 33 Cervical cervical JJ 10_1101-436634 293 34 squamous squamous JJ 10_1101-436634 293 35 cell cell NN 10_1101-436634 293 36 carcinoma carcinoma NN 10_1101-436634 293 37 and and CC 10_1101-436634 293 38 endocervical endocervical JJ 10_1101-436634 293 39 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 40 ( ( -LRB- 10_1101-436634 293 41 CESC CESC NNP 10_1101-436634 293 42 ) ) -RRB- 10_1101-436634 293 43 , , , 10_1101-436634 293 44 Cholangiocarcinoma Cholangiocarcinoma NNP 10_1101-436634 293 45 ( ( -LRB- 10_1101-436634 293 46 CHOL CHOL NNP 10_1101-436634 293 47 ) ) -RRB- 10_1101-436634 293 48 , , , 10_1101-436634 293 49 Colon colon NN 10_1101-436634 293 50 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 51 ( ( -LRB- 10_1101-436634 293 52 COAD COAD NNP 10_1101-436634 293 53 ) ) -RRB- 10_1101-436634 293 54 , , , 10_1101-436634 293 55 Esophageal esophageal JJ 10_1101-436634 293 56 carcinoma carcinoma NN 10_1101-436634 293 57 ( ( -LRB- 10_1101-436634 293 58 ESCA ESCA NNP 10_1101-436634 293 59 ) ) -RRB- 10_1101-436634 293 60 , , , 10_1101-436634 293 61 Glioblastoma Glioblastoma NNP 10_1101-436634 293 62 multiforme multiforme NN 10_1101-436634 293 63 ( ( -LRB- 10_1101-436634 293 64 GBM GBM NNP 10_1101-436634 293 65 ) ) -RRB- 10_1101-436634 293 66 , , , 10_1101-436634 293 67 Head Head NNP 10_1101-436634 293 68 and and CC 10_1101-436634 293 69 Neck Neck NNP 10_1101-436634 293 70 squamous squamous JJ 10_1101-436634 293 71 cell cell NN 10_1101-436634 293 72 carcinoma carcinoma NN 10_1101-436634 293 73 ( ( -LRB- 10_1101-436634 293 74 HNSC HNSC NNP 10_1101-436634 293 75 ) ) -RRB- 10_1101-436634 293 76 , , , 10_1101-436634 293 77 Kidney Kidney NNP 10_1101-436634 293 78 Chromophobe Chromophobe NNP 10_1101-436634 293 79 ( ( -LRB- 10_1101-436634 293 80 KICH KICH NNP 10_1101-436634 293 81 ) ) -RRB- 10_1101-436634 293 82 , , , 10_1101-436634 293 83 Kidney Kidney NNP 10_1101-436634 293 84 renal renal JJ 10_1101-436634 293 85 clear clear JJ 10_1101-436634 293 86 cell cell NN 10_1101-436634 293 87 carcinoma carcinoma NN 10_1101-436634 293 88 ( ( -LRB- 10_1101-436634 293 89 KIRC KIRC NNP 10_1101-436634 293 90 ) ) -RRB- 10_1101-436634 293 91 , , , 10_1101-436634 293 92 Kidney Kidney NNP 10_1101-436634 293 93 renal renal JJ 10_1101-436634 293 94 papillary papillary JJ 10_1101-436634 293 95 cell cell NN 10_1101-436634 293 96 carcinoma carcinoma NN 10_1101-436634 293 97 ( ( -LRB- 10_1101-436634 293 98 KIRP KIRP NNP 10_1101-436634 293 99 ) ) -RRB- 10_1101-436634 293 100 , , , 10_1101-436634 293 101 Liver Liver NNP 10_1101-436634 293 102 hepatocellular hepatocellular NN 10_1101-436634 293 103 carcinoma carcinoma NN 10_1101-436634 293 104 ( ( -LRB- 10_1101-436634 293 105 LIHC LIHC NNP 10_1101-436634 293 106 ) ) -RRB- 10_1101-436634 293 107 , , , 10_1101-436634 293 108 Lung Lung NNP 10_1101-436634 293 109 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 110 ( ( -LRB- 10_1101-436634 293 111 LUAD luad RB 10_1101-436634 293 112 ) ) -RRB- 10_1101-436634 293 113 , , , 10_1101-436634 293 114 Lung Lung NNP 10_1101-436634 293 115 squamous squamous JJ 10_1101-436634 293 116 cell cell NN 10_1101-436634 293 117 carcinoma carcinoma NN 10_1101-436634 293 118 ( ( -LRB- 10_1101-436634 293 119 LUSC LUSC NNP 10_1101-436634 293 120 ) ) -RRB- 10_1101-436634 293 121 , , , 10_1101-436634 293 122 Lymphoid Lymphoid NNP 10_1101-436634 293 123 Neoplasm Neoplasm NNP 10_1101-436634 293 124 Diffuse Diffuse NNP 10_1101-436634 293 125 Large large JJ 10_1101-436634 293 126 B b NN 10_1101-436634 293 127 cell cell NN 10_1101-436634 293 128 Lymphoma Lymphoma NNP 10_1101-436634 293 129 ( ( -LRB- 10_1101-436634 293 130 DLBC DLBC NNP 10_1101-436634 293 131 ) ) -RRB- 10_1101-436634 293 132 , , , 10_1101-436634 293 133 Mesothelioma Mesothelioma NNP 10_1101-436634 293 134 ( ( -LRB- 10_1101-436634 293 135 MESO MESO NNP 10_1101-436634 293 136 ) ) -RRB- 10_1101-436634 293 137 , , , 10_1101-436634 293 138 Ovarian ovarian JJ 10_1101-436634 293 139 serous serous JJ 10_1101-436634 293 140 cystadenocarcinoma cystadenocarcinoma NN 10_1101-436634 293 141 ( ( -LRB- 10_1101-436634 293 142 OV OV NNP 10_1101-436634 293 143 ) ) -RRB- 10_1101-436634 293 144 , , , 10_1101-436634 293 145 Pancreatic pancreatic JJ 10_1101-436634 293 146 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 147 ( ( -LRB- 10_1101-436634 293 148 PAAD PAAD NNP 10_1101-436634 293 149 ) ) -RRB- 10_1101-436634 293 150 , , , 10_1101-436634 293 151 Pheochromocytoma Pheochromocytoma NNP 10_1101-436634 293 152 and and CC 10_1101-436634 293 153 Paraganglioma Paraganglioma NNP 10_1101-436634 293 154 ( ( -LRB- 10_1101-436634 293 155 PCPG PCPG NNP 10_1101-436634 293 156 ) ) -RRB- 10_1101-436634 293 157 , , , 10_1101-436634 293 158 Prostate Prostate NNP 10_1101-436634 293 159 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 160 ( ( -LRB- 10_1101-436634 293 161 PRAD PRAD NNP 10_1101-436634 293 162 ) ) -RRB- 10_1101-436634 293 163 , , , 10_1101-436634 293 164 Rectum Rectum NNP 10_1101-436634 293 165 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 166 ( ( -LRB- 10_1101-436634 293 167 READ READ NNP 10_1101-436634 293 168 ) ) -RRB- 10_1101-436634 293 169 , , , 10_1101-436634 293 170 Sarcoma Sarcoma NNP 10_1101-436634 293 171 ( ( -LRB- 10_1101-436634 293 172 SARC SARC NNP 10_1101-436634 293 173 ) ) -RRB- 10_1101-436634 293 174 , , , 10_1101-436634 293 175 Skin Skin NNP 10_1101-436634 293 176 Cutaneous Cutaneous NNP 10_1101-436634 293 177 Melanoma Melanoma NNP 10_1101-436634 293 178 ( ( -LRB- 10_1101-436634 293 179 SKCM SKCM NNP 10_1101-436634 293 180 ) ) -RRB- 10_1101-436634 293 181 , , , 10_1101-436634 293 182 Stomach stomach NN 10_1101-436634 293 183 adenocarcinoma adenocarcinoma NN 10_1101-436634 293 184 ( ( -LRB- 10_1101-436634 293 185 STAD STAD NNP 10_1101-436634 293 186 ) ) -RRB- 10_1101-436634 293 187 , , , 10_1101-436634 293 188 Testicular Testicular NNP 10_1101-436634 293 189 Germ Germ NNP 10_1101-436634 293 190 Cell Cell NNP 10_1101-436634 293 191 Tumors Tumors NNPS 10_1101-436634 293 192 ( ( -LRB- 10_1101-436634 293 193 TGCT TGCT NNP 10_1101-436634 293 194 ) ) -RRB- 10_1101-436634 293 195 , , , 10_1101-436634 293 196 Thymoma Thymoma NNP 10_1101-436634 293 197 ( ( -LRB- 10_1101-436634 293 198 THYM THYM NNP 10_1101-436634 293 199 ) ) -RRB- 10_1101-436634 293 200 , , , 10_1101-436634 293 201 Thyroid Thyroid NNP 10_1101-436634 293 202 carcinoma carcinoma NN 10_1101-436634 293 203 ( ( -LRB- 10_1101-436634 293 204 THCA THCA NNP 10_1101-436634 293 205 ) ) -RRB- 10_1101-436634 293 206 , , , 10_1101-436634 293 207 Uterine Uterine NNP 10_1101-436634 293 208 Carcinosarcoma Carcinosarcoma NNP 10_1101-436634 293 209 ( ( -LRB- 10_1101-436634 293 210 UCS UCS NNP 10_1101-436634 293 211 ) ) -RRB- 10_1101-436634 293 212 , , , 10_1101-436634 293 213 Uterine Uterine NNP 10_1101-436634 293 214 Corpus Corpus NNP 10_1101-436634 293 215 Endometrial Endometrial NNP 10_1101-436634 293 216 Carcinoma Carcinoma NNP 10_1101-436634 293 217 ( ( -LRB- 10_1101-436634 293 218 UCEC UCEC NNP 10_1101-436634 293 219 ) ) -RRB- 10_1101-436634 293 220 , , , 10_1101-436634 293 221 and and CC 10_1101-436634 293 222 Uveal Uveal NNP 10_1101-436634 293 223 Melanoma Melanoma NNP 10_1101-436634 293 224 ( ( -LRB- 10_1101-436634 293 225 UVM UVM NNP 10_1101-436634 293 226 ) ) -RRB- 10_1101-436634 293 227 . . . 10_1101-436634 294 1 Three three CD 10_1101-436634 294 2 cohorts cohort NNS 10_1101-436634 294 3 were be VBD 10_1101-436634 294 4 derived derive VBN 10_1101-436634 294 5 from from IN 10_1101-436634 294 6 patients patient NNS 10_1101-436634 294 7 at at IN 10_1101-436634 294 8 Washington Washington NNP 10_1101-436634 294 9 University University NNP 10_1101-436634 294 10 in in IN 10_1101-436634 294 11 St. St. NNP 10_1101-436634 294 12 Louis Louis NNP 10_1101-436634 294 13 . . . 10_1101-436634 295 1 These these DT 10_1101-436634 295 2 cohorts cohort NNS 10_1101-436634 295 3 are be VBP 10_1101-436634 295 4 Hepatocellular Hepatocellular NNP 10_1101-436634 295 5 Carcinoma Carcinoma NNP 10_1101-436634 295 6 ( ( -LRB- 10_1101-436634 295 7 HCC HCC NNP 10_1101-436634 295 8 ) ) -RRB- 10_1101-436634 295 9 , , , 10_1101-436634 295 10 Oral Oral NNP 10_1101-436634 295 11 Squamous Squamous NNP 10_1101-436634 295 12 Cell Cell NNP 10_1101-436634 295 13 Carcinoma Carcinoma NNP 10_1101-436634 295 14 ( ( -LRB- 10_1101-436634 295 15 OSCC OSCC NNP 10_1101-436634 295 16 ) ) -RRB- 10_1101-436634 295 17 , , , 10_1101-436634 295 18 and and CC 10_1101-436634 295 19 Small Small NNP 10_1101-436634 295 20 Cell Cell NNP 10_1101-436634 295 21 Lung Lung NNP 10_1101-436634 295 22 Cancer Cancer NNP 10_1101-436634 295 23 ( ( -LRB- 10_1101-436634 295 24 SCLC SCLC NNP 10_1101-436634 295 25 ) ) -RRB- 10_1101-436634 295 26 . . . 10_1101-436634 296 1 Sample sample NN 10_1101-436634 296 2 processing processing NN 10_1101-436634 296 3 We -PRON- PRP 10_1101-436634 296 4 applied apply VBD 10_1101-436634 296 5 RegTools regtool NNS 10_1101-436634 296 6 to to TO 10_1101-436634 296 7 35 35 CD 10_1101-436634 296 8 tumor tumor NN 10_1101-436634 296 9 cohorts cohort NNS 10_1101-436634 296 10 . . . 10_1101-436634 297 1 Genomic genomic JJ 10_1101-436634 297 2 and and CC 10_1101-436634 297 3 transcriptomic transcriptomic JJ 10_1101-436634 297 4 data datum NNS 10_1101-436634 297 5 for for IN 10_1101-436634 297 6 32 32 CD 10_1101-436634 297 7 cohorts cohort NNS 10_1101-436634 297 8 were be VBD 10_1101-436634 297 9 obtained obtain VBN 10_1101-436634 297 10 from from IN 10_1101-436634 297 11 The the DT 10_1101-436634 297 12 Cancer Cancer NNP 10_1101-436634 297 13 Genome Genome NNP 10_1101-436634 297 14 Atlas Atlas NNP 10_1101-436634 297 15 ( ( -LRB- 10_1101-436634 297 16 TCGA TCGA NNP 10_1101-436634 297 17 ) ) -RRB- 10_1101-436634 297 18 . . . 10_1101-436634 298 1 Information information NN 10_1101-436634 298 2 regarding regard VBG 10_1101-436634 298 3 the the DT 10_1101-436634 298 4 alignment alignment NN 10_1101-436634 298 5 and and CC 10_1101-436634 298 6 variant variant JJ 10_1101-436634 298 7 calling calling NN 10_1101-436634 298 8 for for IN 10_1101-436634 298 9 these these DT 10_1101-436634 298 10 samples sample NNS 10_1101-436634 298 11 is be VBZ 10_1101-436634 298 12 described describe VBN 10_1101-436634 298 13 by by IN 10_1101-436634 298 14 the the DT 10_1101-436634 298 15 Genomic Genomic NNP 10_1101-436634 298 16 Data Data NNP 10_1101-436634 298 17 Commons Commons NNP 10_1101-436634 298 18 data data NN 10_1101-436634 298 19 harmonization harmonization NN 10_1101-436634 298 20 effort47 effort47 NNS 10_1101-436634 298 21 . . . 10_1101-436634 299 1 Whole whole JJ 10_1101-436634 299 2 exome exome JJ 10_1101-436634 299 3 sequencing sequencing NN 10_1101-436634 299 4 ( ( -LRB- 10_1101-436634 299 5 WES WES NNP 10_1101-436634 299 6 ) ) -RRB- 10_1101-436634 299 7 mutation mutation NN 10_1101-436634 299 8 calls call NNS 10_1101-436634 299 9 for for IN 10_1101-436634 299 10 these these DT 10_1101-436634 299 11 samples sample NNS 10_1101-436634 299 12 from from IN 10_1101-436634 299 13 MuSE48 muse48 NN 10_1101-436634 299 14 , , , 10_1101-436634 299 15 MuTect249 mutect249 NN 10_1101-436634 299 16 , , , 10_1101-436634 299 17 VarScan250 VarScan250 NNP 10_1101-436634 299 18 , , , 10_1101-436634 299 19 and and CC 10_1101-436634 299 20 SomaticSniper51 somaticsniper51 NN 10_1101-436634 299 21 , , , 10_1101-436634 299 22 were be VBD 10_1101-436634 299 23 left leave VBN 10_1101-436634 299 24 - - HYPH 10_1101-436634 299 25 aligned align VBN 10_1101-436634 299 26 , , , 10_1101-436634 299 27 trimmed trim VBN 10_1101-436634 299 28 , , , 10_1101-436634 299 29 and and CC 10_1101-436634 299 30 decomposed decompose VBD 10_1101-436634 299 31 to to TO 10_1101-436634 299 32 ensure ensure VB 10_1101-436634 299 33 the the DT 10_1101-436634 299 34 correct correct JJ 10_1101-436634 299 35 representation representation NN 10_1101-436634 299 36 of of IN 10_1101-436634 299 37 the the DT 10_1101-436634 299 38 variants variant NNS 10_1101-436634 299 39 across across IN 10_1101-436634 299 40 the the DT 10_1101-436634 299 41 multiple multiple JJ 10_1101-436634 299 42 callers caller NNS 10_1101-436634 299 43 . . . 10_1101-436634 300 1 Samples sample NNS 10_1101-436634 300 2 for for IN 10_1101-436634 300 3 the the DT 10_1101-436634 300 4 remaining remain VBG 10_1101-436634 300 5 three three CD 10_1101-436634 300 6 cohorts cohort NNS 10_1101-436634 300 7 , , , 10_1101-436634 300 8 HCC HCC NNP 10_1101-436634 300 9 , , , 10_1101-436634 300 10 SCLC SCLC NNP 10_1101-436634 300 11 , , , 10_1101-436634 300 12 and and CC 10_1101-436634 300 13 OSCC OSCC NNP 10_1101-436634 300 14 , , , 10_1101-436634 300 15 were be VBD 10_1101-436634 300 16 sequenced sequence VBN 10_1101-436634 300 17 at at IN 10_1101-436634 300 18 Washington Washington NNP 10_1101-436634 300 19 University University NNP 10_1101-436634 300 20 in in IN 10_1101-436634 300 21 St. St. NNP 10_1101-436634 300 22 Louis Louis NNP 10_1101-436634 300 23 . . . 10_1101-436634 301 1 Genomic genomic JJ 10_1101-436634 301 2 data datum NNS 10_1101-436634 301 3 were be VBD 10_1101-436634 301 4 produced produce VBN 10_1101-436634 301 5 by by IN 10_1101-436634 301 6 WES WES NNP 10_1101-436634 301 7 for for IN 10_1101-436634 301 8 SCLC SCLC NNP 10_1101-436634 301 9 and and CC 10_1101-436634 301 10 OSCC OSCC NNP 10_1101-436634 301 11 and and CC 10_1101-436634 301 12 whole whole JJ 10_1101-436634 301 13 genome genome JJ 10_1101-436634 301 14 sequencing sequencing NN 10_1101-436634 301 15 ( ( -LRB- 10_1101-436634 301 16 WGS WGS NNP 10_1101-436634 301 17 ) ) -RRB- 10_1101-436634 301 18 for for IN 10_1101-436634 301 19 HCC HCC NNP 10_1101-436634 301 20 . . . 10_1101-436634 302 1 Normal normal JJ 10_1101-436634 302 2 genomic genomic JJ 10_1101-436634 302 3 data datum NNS 10_1101-436634 302 4 of of IN 10_1101-436634 302 5 the the DT 10_1101-436634 302 6 same same JJ 10_1101-436634 302 7 sequencing sequencing NN 10_1101-436634 302 8 type type NN 10_1101-436634 302 9 and and CC 10_1101-436634 302 10 tumor tumor NN 10_1101-436634 302 11 RNA RNA NNP 10_1101-436634 302 12 - - HYPH 10_1101-436634 302 13 seq seq NN 10_1101-436634 302 14 data datum NNS 10_1101-436634 302 15 were be VBD 10_1101-436634 302 16 also also RB 10_1101-436634 302 17 available available JJ 10_1101-436634 302 18 for for IN 10_1101-436634 302 19 all all DT 10_1101-436634 302 20 subjects subject NNS 10_1101-436634 302 21 . . . 10_1101-436634 303 1 Sequence sequence NN 10_1101-436634 303 2 data datum NNS 10_1101-436634 303 3 were be VBD 10_1101-436634 303 4 aligned align VBN 10_1101-436634 303 5 using use VBG 10_1101-436634 303 6 the the DT 10_1101-436634 303 7 Genome Genome NNP 10_1101-436634 303 8 Modeling Modeling NNP 10_1101-436634 303 9 System System NNP 10_1101-436634 303 10 ( ( -LRB- 10_1101-436634 303 11 GMS)52 GMS)52 NNP 10_1101-436634 303 12 using use VBG 10_1101-436634 303 13 TopHat2 TopHat2 NNP 10_1101-436634 303 14 for for IN 10_1101-436634 303 15 RNA RNA NNP 10_1101-436634 303 16 and and CC 10_1101-436634 303 17 BWA BWA NNP 10_1101-436634 303 18 - - HYPH 10_1101-436634 303 19 MEM53 MEM53 NNP 10_1101-436634 303 20 for for IN 10_1101-436634 303 21 DNA DNA NNP 10_1101-436634 303 22 . . . 10_1101-436634 304 1 HCC HCC NNP 10_1101-436634 304 2 and and CC 10_1101-436634 304 3 SCLC SCLC NNP 10_1101-436634 304 4 were be VBD 10_1101-436634 304 5 aligned align VBN 10_1101-436634 304 6 to to IN 10_1101-436634 304 7 GRCh37 GRCh37 NNP 10_1101-436634 304 8 while while IN 10_1101-436634 304 9 OSCC OSCC NNP 10_1101-436634 304 10 was be VBD 10_1101-436634 304 11 aligned align VBN 10_1101-436634 304 12 to to IN 10_1101-436634 304 13 GRCh38 GRCh38 NNP 10_1101-436634 304 14 . . . 10_1101-436634 305 1 Somatic somatic JJ 10_1101-436634 305 2 variant variant JJ 10_1101-436634 305 3 calls call NNS 10_1101-436634 305 4 were be VBD 10_1101-436634 305 5 made make VBN 10_1101-436634 305 6 using use VBG 10_1101-436634 305 7 Samtools Samtools NNP 10_1101-436634 305 8 v0.1.126 v0.1.126 NN 10_1101-436634 305 9 , , , 10_1101-436634 305 10 SomaticSniper2 somaticsniper2 CD 10_1101-436634 305 11 v1.0.251 v1.0.251 NNS 10_1101-436634 305 12 , , , 10_1101-436634 305 13 Strelka Strelka NNP 10_1101-436634 305 14 V0.4.6.254 V0.4.6.254 NNS 10_1101-436634 305 15 , , , 10_1101-436634 305 16 and and CC 10_1101-436634 305 17 VarScan VarScan NNP 10_1101-436634 305 18 v2.2.650,54 v2.2.650,54 NNP 10_1101-436634 305 19 through through IN 10_1101-436634 305 20 the the DT 10_1101-436634 305 21 GMS GMS NNP 10_1101-436634 305 22 . . . 10_1101-436634 306 1 High high JJ 10_1101-436634 306 2 - - HYPH 10_1101-436634 306 3 quality quality NN 10_1101-436634 306 4 mutations mutation NNS 10_1101-436634 306 5 for for IN 10_1101-436634 306 6 all all DT 10_1101-436634 306 7 samples sample NNS 10_1101-436634 306 8 were be VBD 10_1101-436634 306 9 then then RB 10_1101-436634 306 10 selected select VBN 10_1101-436634 306 11 by by IN 10_1101-436634 306 12 requiring require VBG 10_1101-436634 306 13 that that IN 10_1101-436634 306 14 a a DT 10_1101-436634 306 15 variant variant JJ 10_1101-436634 306 16 be be NN 10_1101-436634 306 17 called call VBN 10_1101-436634 306 18 by by IN 10_1101-436634 306 19 two two CD 10_1101-436634 306 20 of of IN 10_1101-436634 306 21 the the DT 10_1101-436634 306 22 four four CD 10_1101-436634 306 23 variant variant JJ 10_1101-436634 306 24 callers caller NNS 10_1101-436634 306 25 . . . 10_1101-436634 307 1 Candidate candidate NN 10_1101-436634 307 2 junction junction NN 10_1101-436634 307 3 filtering filter VBG 10_1101-436634 307 4 To to TO 10_1101-436634 307 5 generate generate VB 10_1101-436634 307 6 results result NNS 10_1101-436634 307 7 for for IN 10_1101-436634 307 8 4 4 CD 10_1101-436634 307 9 splice splice NN 10_1101-436634 307 10 variant variant JJ 10_1101-436634 307 11 window window NN 10_1101-436634 307 12 sizes size NNS 10_1101-436634 307 13 , , , 10_1101-436634 307 14 we -PRON- PRP 10_1101-436634 307 15 ran run VBD 10_1101-436634 307 16 cis cis NN 10_1101-436634 307 17 - - HYPH 10_1101-436634 307 18 splice splice NN 10_1101-436634 307 19 - - HYPH 10_1101-436634 307 20 effects effect NNS 10_1101-436634 307 21 identify identify VBP 10_1101-436634 307 22 with with IN 10_1101-436634 307 23 4 4 CD 10_1101-436634 307 24 sets set NNS 10_1101-436634 307 25 of of IN 10_1101-436634 307 26 splice splice NN 10_1101-436634 307 27 variant variant JJ 10_1101-436634 307 28 window window NN 10_1101-436634 307 29 parameters parameter NNS 10_1101-436634 307 30 . . . 10_1101-436634 308 1 For for IN 10_1101-436634 308 2 our -PRON- PRP$ 10_1101-436634 308 3 “ " `` 10_1101-436634 308 4 i2e3 i2e3 NNP 10_1101-436634 308 5 ” " '' 10_1101-436634 308 6 window window NN 10_1101-436634 308 7 ( ( -LRB- 10_1101-436634 308 8 RegTools RegTools NNP 10_1101-436634 308 9 default default NN 10_1101-436634 308 10 ) ) -RRB- 10_1101-436634 308 11 , , , 10_1101-436634 308 12 to to TO 10_1101-436634 308 13 examine examine VB 10_1101-436634 308 14 intronic intronic JJ 10_1101-436634 308 15 variants variant NNS 10_1101-436634 308 16 within within IN 10_1101-436634 308 17 2 2 CD 10_1101-436634 308 18 bases basis NNS 10_1101-436634 308 19 and and CC 10_1101-436634 308 20 exonic exonic JJ 10_1101-436634 308 21 variants variant NNS 10_1101-436634 308 22 within within IN 10_1101-436634 308 23 3 3 CD 10_1101-436634 308 24 bases basis NNS 10_1101-436634 308 25 of of IN 10_1101-436634 308 26 the the DT 10_1101-436634 308 27 exon exon JJ 10_1101-436634 308 28 edge edge NN 10_1101-436634 308 29 , , , 10_1101-436634 308 30 we -PRON- PRP 10_1101-436634 308 31 set set VBP 10_1101-436634 308 32 “ " `` 10_1101-436634 308 33 -i -i : 10_1101-436634 308 34 2 2 CD 10_1101-436634 308 35 -e -e SYM 10_1101-436634 308 36 3 3 CD 10_1101-436634 308 37 ” " '' 10_1101-436634 308 38 . . . 10_1101-436634 309 1 Similarly similarly RB 10_1101-436634 309 2 , , , 10_1101-436634 309 3 for for IN 10_1101-436634 309 4 “ " `` 10_1101-436634 309 5 i50e5 i50e5 NNP 10_1101-436634 309 6 ” " '' 10_1101-436634 309 7 , , , 10_1101-436634 309 8 to to TO 10_1101-436634 309 9 examine examine VB 10_1101-436634 309 10 intronic intronic JJ 10_1101-436634 309 11 variants variant NNS 10_1101-436634 309 12 within within IN 10_1101-436634 309 13 50 50 CD 10_1101-436634 309 14 bases basis NNS 10_1101-436634 309 15 and and CC 10_1101-436634 309 16 exonic exonic JJ 10_1101-436634 309 17 variants variant NNS 10_1101-436634 309 18 within within IN 10_1101-436634 309 19 5 5 CD 10_1101-436634 309 20 bases basis NNS 10_1101-436634 309 21 of of IN 10_1101-436634 309 22 the the DT 10_1101-436634 309 23 exon exon JJ 10_1101-436634 309 24 edge edge NN 10_1101-436634 309 25 , , , 10_1101-436634 309 26 we -PRON- PRP 10_1101-436634 309 27 set set VBP 10_1101-436634 309 28 “ " `` 10_1101-436634 309 29 -i -i : 10_1101-436634 309 30 50 50 CD 10_1101-436634 309 31 -e -e SYM 10_1101-436634 309 32 5 5 CD 10_1101-436634 309 33 ” " '' 10_1101-436634 309 34 . . . 10_1101-436634 310 1 To to TO 10_1101-436634 310 2 view view VB 10_1101-436634 310 3 all all DT 10_1101-436634 310 4 exonic exonic JJ 10_1101-436634 310 5 variants variant NNS 10_1101-436634 310 6 , , , 10_1101-436634 310 7 we -PRON- PRP 10_1101-436634 310 8 simply simply RB 10_1101-436634 310 9 set set VBP 10_1101-436634 310 10 “ " `` 10_1101-436634 310 11 - - HYPH 10_1101-436634 310 12 .CC .CC : 10_1101-436634 310 13 - - HYPH 10_1101-436634 310 14 BY by IN 10_1101-436634 310 15 - - HYPH 10_1101-436634 310 16 NC NC NNP 10_1101-436634 310 17 - - HYPH 10_1101-436634 310 18 ND ND NNP 10_1101-436634 310 19 4.0 4.0 CD 10_1101-436634 310 20 International international JJ 10_1101-436634 310 21 licensea licensea NNS 10_1101-436634 310 22 certified certify VBN 10_1101-436634 310 23 by by IN 10_1101-436634 310 24 peer peer NN 10_1101-436634 310 25 review review NN 10_1101-436634 310 26 ) ) -RRB- 10_1101-436634 310 27 is be VBZ 10_1101-436634 310 28 the the DT 10_1101-436634 310 29 author author NN 10_1101-436634 310 30 / / SYM 10_1101-436634 310 31 funder funder NN 10_1101-436634 310 32 , , , 10_1101-436634 310 33 who who WP 10_1101-436634 310 34 has have VBZ 10_1101-436634 310 35 granted grant VBN 10_1101-436634 310 36 bioRxiv biorxiv IN 10_1101-436634 310 37 a a DT 10_1101-436634 310 38 license license NN 10_1101-436634 310 39 to to TO 10_1101-436634 310 40 display display VB 10_1101-436634 310 41 the the DT 10_1101-436634 310 42 preprint preprint NN 10_1101-436634 310 43 in in IN 10_1101-436634 310 44 perpetuity perpetuity NN 10_1101-436634 310 45 . . . 10_1101-436634 311 1 It -PRON- PRP 10_1101-436634 311 2 is be VBZ 10_1101-436634 311 3 made make VBN 10_1101-436634 311 4 available available JJ 10_1101-436634 311 5 under under IN 10_1101-436634 311 6 The the DT 10_1101-436634 311 7 copyright copyright NN 10_1101-436634 311 8 holder holder NN 10_1101-436634 311 9 for for IN 10_1101-436634 311 10 this this DT 10_1101-436634 311 11 preprint preprint NN 10_1101-436634 311 12 ( ( -LRB- 10_1101-436634 311 13 which which WDT 10_1101-436634 311 14 was be VBD 10_1101-436634 311 15 notthis notthis DT 10_1101-436634 311 16 version version NN 10_1101-436634 311 17 posted post VBN 10_1101-436634 311 18 January January NNP 10_1101-436634 311 19 5 5 CD 10_1101-436634 311 20 , , , 10_1101-436634 311 21 2021 2021 CD 10_1101-436634 311 22 . . . 10_1101-436634 311 23 ; ; : 10_1101-436634 311 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 311 25 : : : 10_1101-436634 311 26 bioRxiv biorxiv VB 10_1101-436634 311 27 preprint preprint NN 10_1101-436634 311 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 311 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 311 30 15 15 CD 10_1101-436634 311 31 E e NN 10_1101-436634 311 32 ” " '' 10_1101-436634 311 33 , , , 10_1101-436634 311 34 without without IN 10_1101-436634 311 35 “ " `` 10_1101-436634 311 36 -i -i : 10_1101-436634 311 37 ” " '' 10_1101-436634 311 38 or or CC 10_1101-436634 311 39 “ " `` 10_1101-436634 311 40 -e -e : 10_1101-436634 311 41 ” " '' 10_1101-436634 311 42 options option NNS 10_1101-436634 311 43 . . . 10_1101-436634 312 1 To to TO 10_1101-436634 312 2 view view VB 10_1101-436634 312 3 all all DT 10_1101-436634 312 4 intronic intronic JJ 10_1101-436634 312 5 variants variant NNS 10_1101-436634 312 6 , , , 10_1101-436634 312 7 we -PRON- PRP 10_1101-436634 312 8 simply simply RB 10_1101-436634 312 9 set set VBP 10_1101-436634 312 10 “ " `` 10_1101-436634 312 11 -I -i FW 10_1101-436634 312 12 ” " '' 10_1101-436634 312 13 , , , 10_1101-436634 312 14 without without IN 10_1101-436634 312 15 “ " `` 10_1101-436634 312 16 -i -i : 10_1101-436634 312 17 ” " '' 10_1101-436634 312 18 or or CC 10_1101-436634 312 19 “ " `` 10_1101-436634 312 20 -e -e : 10_1101-436634 312 21 ” " '' 10_1101-436634 312 22 options option NNS 10_1101-436634 312 23 . . . 10_1101-436634 313 1 TCGA tcga NN 10_1101-436634 313 2 samples sample NNS 10_1101-436634 313 3 were be VBD 10_1101-436634 313 4 processed process VBN 10_1101-436634 313 5 with with IN 10_1101-436634 313 6 GRCh38.d1.vd1.fa GRCh38.d1.vd1.fa HYPH 10_1101-436634 313 7 ( ( -LRB- 10_1101-436634 313 8 downloaded download VBN 10_1101-436634 313 9 from from IN 10_1101-436634 313 10 the the DT 10_1101-436634 313 11 GDC GDC NNP 10_1101-436634 313 12 reference reference NN 10_1101-436634 313 13 file file NN 10_1101-436634 313 14 page page NN 10_1101-436634 313 15 at at IN 10_1101-436634 313 16 https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference- https://gdc.cancer.gov/about-data/gdc-data-processing/gdc-reference- NN 10_1101-436634 313 17 files file NNS 10_1101-436634 313 18 ) ) -RRB- 10_1101-436634 313 19 as as IN 10_1101-436634 313 20 the the DT 10_1101-436634 313 21 reference reference NN 10_1101-436634 313 22 fasta fasta VBZ 10_1101-436634 313 23 file file NN 10_1101-436634 313 24 and and CC 10_1101-436634 313 25 gencode.v29.annotation.gtf gencode.v29.annotation.gtf NNP 10_1101-436634 313 26 ( ( -LRB- 10_1101-436634 313 27 downloaded download VBN 10_1101-436634 313 28 via via IN 10_1101-436634 313 29 the the DT 10_1101-436634 313 30 GENCODE GENCODE NNP 10_1101-436634 313 31 FTP FTP NNP 10_1101-436634 313 32 ) ) -RRB- 10_1101-436634 313 33 as as IN 10_1101-436634 313 34 the the DT 10_1101-436634 313 35 reference reference NN 10_1101-436634 313 36 transcriptome transcriptome VBP 10_1101-436634 313 37 . . . 10_1101-436634 314 1 OSCC OSCC NNP 10_1101-436634 314 2 was be VBD 10_1101-436634 314 3 processed process VBN 10_1101-436634 314 4 with with IN 10_1101-436634 314 5 Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa NNP 10_1101-436634 314 6 and and CC 10_1101-436634 314 7 Homo_sapiens.GRCh38.79.gtf Homo_sapiens.GRCh38.79.gtf NNS 10_1101-436634 314 8 ( ( -LRB- 10_1101-436634 314 9 both both DT 10_1101-436634 314 10 downloaded download VBN 10_1101-436634 314 11 from from IN 10_1101-436634 314 12 Ensembl Ensembl NNP 10_1101-436634 314 13 ) ) -RRB- 10_1101-436634 314 14 . . . 10_1101-436634 315 1 HCC HCC NNP 10_1101-436634 315 2 and and CC 10_1101-436634 315 3 SCLC SCLC NNP 10_1101-436634 315 4 were be VBD 10_1101-436634 315 5 processed process VBN 10_1101-436634 315 6 with with IN 10_1101-436634 315 7 Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa Homo_sapiens.GRCh37.dna_sm.primary_assembly.fa NNP 10_1101-436634 315 8 and and CC 10_1101-436634 315 9 Homo_sapiens.GRCh37.87.gtf Homo_sapiens.GRCh37.87.gtf NNP 10_1101-436634 315 10 ( ( -LRB- 10_1101-436634 315 11 both both DT 10_1101-436634 315 12 downloaded download VBN 10_1101-436634 315 13 from from IN 10_1101-436634 315 14 Ensembl Ensembl NNP 10_1101-436634 315 15 ) ) -RRB- 10_1101-436634 315 16 . . . 10_1101-436634 316 1 Statistical statistical JJ 10_1101-436634 316 2 filtering filtering NN 10_1101-436634 316 3 of of IN 10_1101-436634 316 4 candidate candidate NN 10_1101-436634 316 5 events event NNS 10_1101-436634 316 6 We -PRON- PRP 10_1101-436634 316 7 refer refer VBP 10_1101-436634 316 8 to to IN 10_1101-436634 316 9 a a DT 10_1101-436634 316 10 statistical statistical JJ 10_1101-436634 316 11 association association NN 10_1101-436634 316 12 between between IN 10_1101-436634 316 13 a a DT 10_1101-436634 316 14 variant variant NN 10_1101-436634 316 15 and and CC 10_1101-436634 316 16 a a DT 10_1101-436634 316 17 junction junction NN 10_1101-436634 316 18 as as IN 10_1101-436634 316 19 an an DT 10_1101-436634 316 20 “ " `` 10_1101-436634 316 21 event event NN 10_1101-436634 316 22 ” " '' 10_1101-436634 316 23 . . . 10_1101-436634 317 1 For for IN 10_1101-436634 317 2 each each DT 10_1101-436634 317 3 event event NN 10_1101-436634 317 4 identified identify VBN 10_1101-436634 317 5 by by IN 10_1101-436634 317 6 RegTools RegTools NNP 10_1101-436634 317 7 , , , 10_1101-436634 317 8 a a DT 10_1101-436634 317 9 normalized normalized JJ 10_1101-436634 317 10 score score NN 10_1101-436634 317 11 ( ( -LRB- 10_1101-436634 317 12 norm_score norm_score NNP 10_1101-436634 317 13 ) ) -RRB- 10_1101-436634 317 14 was be VBD 10_1101-436634 317 15 calculated calculate VBN 10_1101-436634 317 16 for for IN 10_1101-436634 317 17 the the DT 10_1101-436634 317 18 junction junction NN 10_1101-436634 317 19 of of IN 10_1101-436634 317 20 the the DT 10_1101-436634 317 21 event event NN 10_1101-436634 317 22 by by IN 10_1101-436634 317 23 dividing divide VBG 10_1101-436634 317 24 the the DT 10_1101-436634 317 25 number number NN 10_1101-436634 317 26 of of IN 10_1101-436634 317 27 reads read NNS 10_1101-436634 317 28 supporting support VBG 10_1101-436634 317 29 that that DT 10_1101-436634 317 30 junction junction NN 10_1101-436634 317 31 by by IN 10_1101-436634 317 32 the the DT 10_1101-436634 317 33 sum sum NN 10_1101-436634 317 34 of of IN 10_1101-436634 317 35 all all DT 10_1101-436634 317 36 reads read NNS 10_1101-436634 317 37 for for IN 10_1101-436634 317 38 all all DT 10_1101-436634 317 39 junctions junction NNS 10_1101-436634 317 40 within within IN 10_1101-436634 317 41 the the DT 10_1101-436634 317 42 splice splice NN 10_1101-436634 317 43 junction junction NN 10_1101-436634 317 44 region region NN 10_1101-436634 317 45 for for IN 10_1101-436634 317 46 the the DT 10_1101-436634 317 47 variant variant NN 10_1101-436634 317 48 of of IN 10_1101-436634 317 49 interest interest NN 10_1101-436634 317 50 . . . 10_1101-436634 318 1 This this DT 10_1101-436634 318 2 metric metric NN 10_1101-436634 318 3 is be VBZ 10_1101-436634 318 4 conceptually conceptually RB 10_1101-436634 318 5 similar similar JJ 10_1101-436634 318 6 to to IN 10_1101-436634 318 7 a a DT 10_1101-436634 318 8 “ " `` 10_1101-436634 318 9 percent percent NN 10_1101-436634 318 10 - - HYPH 10_1101-436634 318 11 spliced spliced JJ 10_1101-436634 318 12 in in IN 10_1101-436634 318 13 ” " '' 10_1101-436634 318 14 ( ( -LRB- 10_1101-436634 318 15 PSI PSI NNP 10_1101-436634 318 16 ) ) -RRB- 10_1101-436634 318 17 index index NN 10_1101-436634 318 18 , , , 10_1101-436634 318 19 but but CC 10_1101-436634 318 20 measures measure VBZ 10_1101-436634 318 21 the the DT 10_1101-436634 318 22 presence presence NN 10_1101-436634 318 23 of of IN 10_1101-436634 318 24 entire entire JJ 10_1101-436634 318 25 exon exon NN 10_1101-436634 318 26 - - HYPH 10_1101-436634 318 27 exon exon NN 10_1101-436634 318 28 junctions junction NNS 10_1101-436634 318 29 , , , 10_1101-436634 318 30 instead instead RB 10_1101-436634 318 31 of of IN 10_1101-436634 318 32 just just RB 10_1101-436634 318 33 the the DT 10_1101-436634 318 34 inclusion inclusion NN 10_1101-436634 318 35 of of IN 10_1101-436634 318 36 individual individual JJ 10_1101-436634 318 37 exons exon NNS 10_1101-436634 318 38 . . . 10_1101-436634 319 1 If if IN 10_1101-436634 319 2 there there EX 10_1101-436634 319 3 were be VBD 10_1101-436634 319 4 multiple multiple JJ 10_1101-436634 319 5 samples sample NNS 10_1101-436634 319 6 that that WDT 10_1101-436634 319 7 contained contain VBD 10_1101-436634 319 8 the the DT 10_1101-436634 319 9 variant variant NN 10_1101-436634 319 10 for for IN 10_1101-436634 319 11 the the DT 10_1101-436634 319 12 event event NN 10_1101-436634 319 13 , , , 10_1101-436634 319 14 then then RB 10_1101-436634 319 15 the the DT 10_1101-436634 319 16 mean mean NN 10_1101-436634 319 17 of of IN 10_1101-436634 319 18 the the DT 10_1101-436634 319 19 normalized normalized JJ 10_1101-436634 319 20 scores score NNS 10_1101-436634 319 21 for for IN 10_1101-436634 319 22 the the DT 10_1101-436634 319 23 samples sample NNS 10_1101-436634 319 24 was be VBD 10_1101-436634 319 25 computed compute VBN 10_1101-436634 319 26 ( ( -LRB- 10_1101-436634 319 27 mean_norm_score mean_norm_score NN 10_1101-436634 319 28 ) ) -RRB- 10_1101-436634 319 29 . . . 10_1101-436634 320 1 If if IN 10_1101-436634 320 2 only only RB 10_1101-436634 320 3 one one CD 10_1101-436634 320 4 sample sample NN 10_1101-436634 320 5 contained contain VBD 10_1101-436634 320 6 the the DT 10_1101-436634 320 7 variant variant NN 10_1101-436634 320 8 , , , 10_1101-436634 320 9 its -PRON- PRP$ 10_1101-436634 320 10 mean_norm_score mean_norm_score NN 10_1101-436634 320 11 was be VBD 10_1101-436634 320 12 thus thus RB 10_1101-436634 320 13 equal equal JJ 10_1101-436634 320 14 to to IN 10_1101-436634 320 15 its -PRON- PRP$ 10_1101-436634 320 16 norm_score norm_score NN 10_1101-436634 320 17 . . . 10_1101-436634 321 1 This this DT 10_1101-436634 321 2 value value NN 10_1101-436634 321 3 was be VBD 10_1101-436634 321 4 then then RB 10_1101-436634 321 5 compared compare VBN 10_1101-436634 321 6 to to IN 10_1101-436634 321 7 the the DT 10_1101-436634 321 8 distribution distribution NN 10_1101-436634 321 9 of of IN 10_1101-436634 321 10 samples sample NNS 10_1101-436634 321 11 which which WDT 10_1101-436634 321 12 did do VBD 10_1101-436634 321 13 not not RB 10_1101-436634 321 14 contain contain VB 10_1101-436634 321 15 the the DT 10_1101-436634 321 16 variant variant NN 10_1101-436634 321 17 to to TO 10_1101-436634 321 18 calculate calculate VB 10_1101-436634 321 19 a a DT 10_1101-436634 321 20 p p NN 10_1101-436634 321 21 - - HYPH 10_1101-436634 321 22 value value NN 10_1101-436634 321 23 as as IN 10_1101-436634 321 24 the the DT 10_1101-436634 321 25 percentage percentage NN 10_1101-436634 321 26 of of IN 10_1101-436634 321 27 the the DT 10_1101-436634 321 28 norm_scores norm_scores NNPS 10_1101-436634 321 29 from from IN 10_1101-436634 321 30 these these DT 10_1101-436634 321 31 samples sample NNS 10_1101-436634 321 32 which which WDT 10_1101-436634 321 33 are be VBP 10_1101-436634 321 34 at at IN 10_1101-436634 321 35 least least JJS 10_1101-436634 321 36 as as RB 10_1101-436634 321 37 high high JJ 10_1101-436634 321 38 as as IN 10_1101-436634 321 39 the the DT 10_1101-436634 321 40 mean_norm_score mean_norm_score NN 10_1101-436634 321 41 computed compute VBD 10_1101-436634 321 42 for for IN 10_1101-436634 321 43 the the DT 10_1101-436634 321 44 variant variant NN 10_1101-436634 321 45 - - HYPH 10_1101-436634 321 46 containing contain VBG 10_1101-436634 321 47 samples sample NNS 10_1101-436634 321 48 . . . 10_1101-436634 322 1 We -PRON- PRP 10_1101-436634 322 2 performed perform VBD 10_1101-436634 322 3 separate separate JJ 10_1101-436634 322 4 analyses analysis NNS 10_1101-436634 322 5 for for IN 10_1101-436634 322 6 events event NNS 10_1101-436634 322 7 involving involve VBG 10_1101-436634 322 8 canonical canonical JJ 10_1101-436634 322 9 junctions junction NNS 10_1101-436634 322 10 ( ( -LRB- 10_1101-436634 322 11 DA DA NNP 10_1101-436634 322 12 ) ) -RRB- 10_1101-436634 322 13 and and CC 10_1101-436634 322 14 those those DT 10_1101-436634 322 15 involving involve VBG 10_1101-436634 322 16 novel novel JJ 10_1101-436634 322 17 junctions junction NNS 10_1101-436634 322 18 which which WDT 10_1101-436634 322 19 used use VBD 10_1101-436634 322 20 at at RB 10_1101-436634 322 21 least least RBS 10_1101-436634 322 22 one one CD 10_1101-436634 322 23 known know VBN 10_1101-436634 322 24 splice splice NN 10_1101-436634 322 25 site site NN 10_1101-436634 322 26 ( ( -LRB- 10_1101-436634 322 27 D D NNP 10_1101-436634 322 28 / / SYM 10_1101-436634 322 29 A A NNP 10_1101-436634 322 30 / / SYM 10_1101-436634 322 31 NDA NDA NNP 10_1101-436634 322 32 ) ) -RRB- 10_1101-436634 322 33 , , , 10_1101-436634 322 34 based base VBN 10_1101-436634 322 35 on on IN 10_1101-436634 322 36 annotations annotation NNS 10_1101-436634 322 37 in in IN 10_1101-436634 322 38 the the DT 10_1101-436634 322 39 corresponding corresponding JJ 10_1101-436634 322 40 reference reference NN 10_1101-436634 322 41 GTF gtf NN 10_1101-436634 322 42 . . . 10_1101-436634 323 1 For for IN 10_1101-436634 323 2 this this DT 10_1101-436634 323 3 study study NN 10_1101-436634 323 4 , , , 10_1101-436634 323 5 we -PRON- PRP 10_1101-436634 323 6 filtered filter VBD 10_1101-436634 323 7 out out RP 10_1101-436634 323 8 any any DT 10_1101-436634 323 9 junctions junction NNS 10_1101-436634 323 10 which which WDT 10_1101-436634 323 11 did do VBD 10_1101-436634 323 12 not not RB 10_1101-436634 323 13 use use VB 10_1101-436634 323 14 at at RB 10_1101-436634 323 15 least least RBS 10_1101-436634 323 16 one one CD 10_1101-436634 323 17 known know VBN 10_1101-436634 323 18 splice splice NN 10_1101-436634 323 19 site site NN 10_1101-436634 323 20 ( ( -LRB- 10_1101-436634 323 21 N n NN 10_1101-436634 323 22 ) ) -RRB- 10_1101-436634 323 23 and and CC 10_1101-436634 323 24 junctions junction NNS 10_1101-436634 323 25 which which WDT 10_1101-436634 323 26 did do VBD 10_1101-436634 323 27 not not RB 10_1101-436634 323 28 have have VB 10_1101-436634 323 29 at at RB 10_1101-436634 323 30 least least RBS 10_1101-436634 323 31 5 5 CD 10_1101-436634 323 32 reads read NNS 10_1101-436634 323 33 of of IN 10_1101-436634 323 34 evidence evidence NN 10_1101-436634 323 35 across across IN 10_1101-436634 323 36 variant variant NN 10_1101-436634 323 37 - - HYPH 10_1101-436634 323 38 containing contain VBG 10_1101-436634 323 39 samples sample NNS 10_1101-436634 323 40 . . . 10_1101-436634 324 1 The the DT 10_1101-436634 324 2 Benjamini Benjamini NNP 10_1101-436634 324 3 - - HYPH 10_1101-436634 324 4 Hochberg Hochberg NNP 10_1101-436634 324 5 procedure procedure NN 10_1101-436634 324 6 was be VBD 10_1101-436634 324 7 then then RB 10_1101-436634 324 8 applied apply VBN 10_1101-436634 324 9 to to IN 10_1101-436634 324 10 the the DT 10_1101-436634 324 11 remaining remain VBG 10_1101-436634 324 12 events event NNS 10_1101-436634 324 13 . . . 10_1101-436634 325 1 Following follow VBG 10_1101-436634 325 2 correction correction NN 10_1101-436634 325 3 , , , 10_1101-436634 325 4 an an DT 10_1101-436634 325 5 event event NN 10_1101-436634 325 6 was be VBD 10_1101-436634 325 7 considered consider VBN 10_1101-436634 325 8 significant significant JJ 10_1101-436634 325 9 if if IN 10_1101-436634 325 10 its -PRON- PRP$ 10_1101-436634 325 11 adjusted adjusted JJ 10_1101-436634 325 12 p p NN 10_1101-436634 325 13 - - HYPH 10_1101-436634 325 14 value value NN 10_1101-436634 325 15 was be VBD 10_1101-436634 325 16 ≤ ≤ NNP 10_1101-436634 325 17 0.05 0.05 CD 10_1101-436634 325 18 . . . 10_1101-436634 326 1 Annotation annotation NN 10_1101-436634 326 2 with with IN 10_1101-436634 326 3 GTEx GTEx NNP 10_1101-436634 326 4 junction junction NN 10_1101-436634 326 5 data datum NNS 10_1101-436634 326 6 and and CC 10_1101-436634 326 7 other other JJ 10_1101-436634 326 8 splice splice NN 10_1101-436634 326 9 prediction prediction NN 10_1101-436634 326 10 tools tool NNS 10_1101-436634 326 11 Events event NNS 10_1101-436634 326 12 identified identify VBN 10_1101-436634 326 13 by by IN 10_1101-436634 326 14 RegTools regtool NNS 10_1101-436634 326 15 as as IN 10_1101-436634 326 16 significant significant JJ 10_1101-436634 326 17 were be VBD 10_1101-436634 326 18 annotated annotate VBN 10_1101-436634 326 19 with with IN 10_1101-436634 326 20 information information NN 10_1101-436634 326 21 from from IN 10_1101-436634 326 22 GTEx GTEx NNS 10_1101-436634 326 23 , , , 10_1101-436634 326 24 VEP VEP NNP 10_1101-436634 326 25 , , , 10_1101-436634 326 26 SpliceAI SpliceAI NNP 10_1101-436634 326 27 , , , 10_1101-436634 326 28 MiSplice MiSplice NNP 10_1101-436634 326 29 , , , 10_1101-436634 326 30 and and CC 10_1101-436634 326 31 Veridical veridical JJ 10_1101-436634 326 32 . . . 10_1101-436634 327 1 GTEx GTEx NNS 10_1101-436634 327 2 junction junction NN 10_1101-436634 327 3 information information NN 10_1101-436634 327 4 was be VBD 10_1101-436634 327 5 obtained obtain VBN 10_1101-436634 327 6 from from IN 10_1101-436634 327 7 the the DT 10_1101-436634 327 8 GTEx GTEx NNP 10_1101-436634 327 9 Portal Portal NNP 10_1101-436634 327 10 . . . 10_1101-436634 328 1 Specifically specifically RB 10_1101-436634 328 2 , , , 10_1101-436634 328 3 the the DT 10_1101-436634 328 4 exon exon NN 10_1101-436634 328 5 - - HYPH 10_1101-436634 328 6 exon exon JJ 10_1101-436634 328 7 junction junction NN 10_1101-436634 328 8 read read VBD 10_1101-436634 328 9 counts count NNS 10_1101-436634 328 10 file file NN 10_1101-436634 328 11 from from IN 10_1101-436634 328 12 the the DT 10_1101-436634 328 13 v8 v8 NN 10_1101-436634 328 14 release release NN 10_1101-436634 328 15 was be VBD 10_1101-436634 328 16 used use VBN 10_1101-436634 328 17 for for IN 10_1101-436634 328 18 data datum NNS 10_1101-436634 328 19 aligned align VBN 10_1101-436634 328 20 to to IN 10_1101-436634 328 21 GRCh38 GRCh38 NNP 10_1101-436634 328 22 while while IN 10_1101-436634 328 23 the the DT 10_1101-436634 328 24 same same JJ 10_1101-436634 328 25 file file NN 10_1101-436634 328 26 from from IN 10_1101-436634 328 27 the the DT 10_1101-436634 328 28 v7 v7 NNP 10_1101-436634 328 29 release release NN 10_1101-436634 328 30 was be VBD 10_1101-436634 328 31 used use VBN 10_1101-436634 328 32 for for IN 10_1101-436634 328 33 the the DT 10_1101-436634 328 34 data datum NNS 10_1101-436634 328 35 aligned align VBN 10_1101-436634 328 36 to to IN 10_1101-436634 328 37 .CC .CC : 10_1101-436634 328 38 - - : 10_1101-436634 328 39 BY by IN 10_1101-436634 328 40 - - HYPH 10_1101-436634 328 41 NC NC NNP 10_1101-436634 328 42 - - HYPH 10_1101-436634 328 43 ND ND NNP 10_1101-436634 328 44 4.0 4.0 CD 10_1101-436634 328 45 International international JJ 10_1101-436634 328 46 licensea licensea NNS 10_1101-436634 328 47 certified certify VBN 10_1101-436634 328 48 by by IN 10_1101-436634 328 49 peer peer NN 10_1101-436634 328 50 review review NN 10_1101-436634 328 51 ) ) -RRB- 10_1101-436634 328 52 is be VBZ 10_1101-436634 328 53 the the DT 10_1101-436634 328 54 author author NN 10_1101-436634 328 55 / / SYM 10_1101-436634 328 56 funder funder NN 10_1101-436634 328 57 , , , 10_1101-436634 328 58 who who WP 10_1101-436634 328 59 has have VBZ 10_1101-436634 328 60 granted grant VBN 10_1101-436634 328 61 bioRxiv biorxiv IN 10_1101-436634 328 62 a a DT 10_1101-436634 328 63 license license NN 10_1101-436634 328 64 to to TO 10_1101-436634 328 65 display display VB 10_1101-436634 328 66 the the DT 10_1101-436634 328 67 preprint preprint NN 10_1101-436634 328 68 in in IN 10_1101-436634 328 69 perpetuity perpetuity NN 10_1101-436634 328 70 . . . 10_1101-436634 329 1 It -PRON- PRP 10_1101-436634 329 2 is be VBZ 10_1101-436634 329 3 made make VBN 10_1101-436634 329 4 available available JJ 10_1101-436634 329 5 under under IN 10_1101-436634 329 6 The the DT 10_1101-436634 329 7 copyright copyright NN 10_1101-436634 329 8 holder holder NN 10_1101-436634 329 9 for for IN 10_1101-436634 329 10 this this DT 10_1101-436634 329 11 preprint preprint NN 10_1101-436634 329 12 ( ( -LRB- 10_1101-436634 329 13 which which WDT 10_1101-436634 329 14 was be VBD 10_1101-436634 329 15 notthis notthis DT 10_1101-436634 329 16 version version NN 10_1101-436634 329 17 posted post VBN 10_1101-436634 329 18 January January NNP 10_1101-436634 329 19 5 5 CD 10_1101-436634 329 20 , , , 10_1101-436634 329 21 2021 2021 CD 10_1101-436634 329 22 . . . 10_1101-436634 329 23 ; ; : 10_1101-436634 329 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 329 25 : : : 10_1101-436634 329 26 bioRxiv biorxiv VB 10_1101-436634 329 27 preprint preprint NN 10_1101-436634 329 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 329 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 329 30 16 16 CD 10_1101-436634 329 31 GRCh37 GRCh37 NNS 10_1101-436634 329 32 . . . 10_1101-436634 330 1 Mappings mapping NNS 10_1101-436634 330 2 between between IN 10_1101-436634 330 3 tumor tumor NN 10_1101-436634 330 4 cohorts cohort NNS 10_1101-436634 330 5 and and CC 10_1101-436634 330 6 GTEx GTEx NNS 10_1101-436634 330 7 tissues tissue NNS 10_1101-436634 330 8 can can MD 10_1101-436634 330 9 be be VB 10_1101-436634 330 10 found find VBN 10_1101-436634 330 11 in in IN 10_1101-436634 330 12 Supplemental Supplemental NNP 10_1101-436634 330 13 File File NNP 10_1101-436634 330 14 7 7 CD 10_1101-436634 330 15 . . . 10_1101-436634 331 1 We -PRON- PRP 10_1101-436634 331 2 annotated annotate VBD 10_1101-436634 331 3 all all DT 10_1101-436634 331 4 starting start VBG 10_1101-436634 331 5 variants variant NNS 10_1101-436634 331 6 with with IN 10_1101-436634 331 7 VEP VEP NNP 10_1101-436634 331 8 in in IN 10_1101-436634 331 9 the the DT 10_1101-436634 331 10 “ " `` 10_1101-436634 331 11 per_gene per_gene NNP 10_1101-436634 331 12 ” " '' 10_1101-436634 331 13 and and CC 10_1101-436634 331 14 “ " `` 10_1101-436634 331 15 pick pick VB 10_1101-436634 331 16 ” " '' 10_1101-436634 331 17 modes mode NNS 10_1101-436634 331 18 . . . 10_1101-436634 332 1 The the DT 10_1101-436634 332 2 “ " `` 10_1101-436634 332 3 per_gene per_gene NNP 10_1101-436634 332 4 ” " '' 10_1101-436634 332 5 setting set VBG 10_1101-436634 332 6 outputs output NNS 10_1101-436634 332 7 only only RB 10_1101-436634 332 8 the the DT 10_1101-436634 332 9 most most RBS 10_1101-436634 332 10 severe severe JJ 10_1101-436634 332 11 consequence consequence NN 10_1101-436634 332 12 per per IN 10_1101-436634 332 13 gene gene NN 10_1101-436634 332 14 while while IN 10_1101-436634 332 15 the the DT 10_1101-436634 332 16 “ " `` 10_1101-436634 332 17 pick pick NN 10_1101-436634 332 18 ” " '' 10_1101-436634 332 19 setting set VBG 10_1101-436634 332 20 picks pick NNS 10_1101-436634 332 21 one one CD 10_1101-436634 332 22 line line NN 10_1101-436634 332 23 or or CC 10_1101-436634 332 24 block block NN 10_1101-436634 332 25 of of IN 10_1101-436634 332 26 consequence consequence NN 10_1101-436634 332 27 data datum NNS 10_1101-436634 332 28 per per IN 10_1101-436634 332 29 variant variant NN 10_1101-436634 332 30 . . . 10_1101-436634 333 1 We -PRON- PRP 10_1101-436634 333 2 considered consider VBD 10_1101-436634 333 3 any any DT 10_1101-436634 333 4 variant variant NN 10_1101-436634 333 5 with with IN 10_1101-436634 333 6 at at RB 10_1101-436634 333 7 least least RBS 10_1101-436634 333 8 one one CD 10_1101-436634 333 9 splicing splicing NN 10_1101-436634 333 10 - - HYPH 10_1101-436634 333 11 related relate VBN 10_1101-436634 333 12 annotation annotation NN 10_1101-436634 333 13 to to TO 10_1101-436634 333 14 be be VB 10_1101-436634 333 15 “ " `` 10_1101-436634 333 16 VEP VEP NNP 10_1101-436634 333 17 significant significant JJ 10_1101-436634 333 18 ” " '' 10_1101-436634 333 19 . . . 10_1101-436634 334 1 All all DT 10_1101-436634 334 2 variants variant NNS 10_1101-436634 334 3 were be VBD 10_1101-436634 334 4 also also RB 10_1101-436634 334 5 processed process VBN 10_1101-436634 334 6 with with IN 10_1101-436634 334 7 SpliceAI SpliceAI NNP 10_1101-436634 334 8 using use VBG 10_1101-436634 334 9 the the DT 10_1101-436634 334 10 default default NN 10_1101-436634 334 11 options option NNS 10_1101-436634 334 12 . . . 10_1101-436634 335 1 A a DT 10_1101-436634 335 2 variant variant NN 10_1101-436634 335 3 was be VBD 10_1101-436634 335 4 considered consider VBN 10_1101-436634 335 5 to to TO 10_1101-436634 335 6 be be VB 10_1101-436634 335 7 “ " `` 10_1101-436634 335 8 SpliceAI spliceai JJ 10_1101-436634 335 9 significant significant JJ 10_1101-436634 335 10 ” " '' 10_1101-436634 335 11 if if IN 10_1101-436634 335 12 it -PRON- PRP 10_1101-436634 335 13 had have VBD 10_1101-436634 335 14 at at RB 10_1101-436634 335 15 least least RBS 10_1101-436634 335 16 one one CD 10_1101-436634 335 17 score score NN 10_1101-436634 335 18 greater great JJR 10_1101-436634 335 19 than than IN 10_1101-436634 335 20 0.2 0.2 CD 10_1101-436634 335 21 , , , 10_1101-436634 335 22 the the DT 10_1101-436634 335 23 developers developer NNS 10_1101-436634 335 24 ’ ’ POS 10_1101-436634 335 25 value value NN 10_1101-436634 335 26 for for IN 10_1101-436634 335 27 high high JJ 10_1101-436634 335 28 recall recall NN 10_1101-436634 335 29 of of IN 10_1101-436634 335 30 their -PRON- PRP$ 10_1101-436634 335 31 model model NN 10_1101-436634 335 32 . . . 10_1101-436634 336 1 Variants variant NNS 10_1101-436634 336 2 identified identify VBN 10_1101-436634 336 3 by by IN 10_1101-436634 336 4 MiSplice20 misplice20 CD 10_1101-436634 336 5 were be VBD 10_1101-436634 336 6 obtained obtain VBN 10_1101-436634 336 7 from from IN 10_1101-436634 336 8 the the DT 10_1101-436634 336 9 paper paper NN 10_1101-436634 336 10 supplemental supplemental JJ 10_1101-436634 336 11 tables table NNS 10_1101-436634 336 12 and and CC 10_1101-436634 336 13 were be VBD 10_1101-436634 336 14 lifted lift VBN 10_1101-436634 336 15 over over RP 10_1101-436634 336 16 to to IN 10_1101-436634 336 17 GRCh38 GRCh38 NNP 10_1101-436634 336 18 . . . 10_1101-436634 337 1 Variants variant NNS 10_1101-436634 337 2 identified identify VBN 10_1101-436634 337 3 by by IN 10_1101-436634 337 4 SAVNet23 SAVNet23 NNP 10_1101-436634 337 5 were be VBD 10_1101-436634 337 6 obtained obtain VBN 10_1101-436634 337 7 from from IN 10_1101-436634 337 8 the the DT 10_1101-436634 337 9 paper paper NN 10_1101-436634 337 10 supplemental supplemental JJ 10_1101-436634 337 11 tables table NNS 10_1101-436634 337 12 and and CC 10_1101-436634 337 13 were be VBD 10_1101-436634 337 14 lifted lift VBN 10_1101-436634 337 15 over over RP 10_1101-436634 337 16 to to IN 10_1101-436634 337 17 GRCh38 GRCh38 NNP 10_1101-436634 337 18 . . . 10_1101-436634 338 1 Variants variant NNS 10_1101-436634 338 2 identified identify VBN 10_1101-436634 338 3 by by IN 10_1101-436634 338 4 Veridical21,22 Veridical21,22 NNP 10_1101-436634 338 5 were be VBD 10_1101-436634 338 6 obtained obtain VBN 10_1101-436634 338 7 via via IN 10_1101-436634 338 8 download download NN 10_1101-436634 338 9 from from IN 10_1101-436634 338 10 the the DT 10_1101-436634 338 11 link link NN 10_1101-436634 338 12 reference reference NN 10_1101-436634 338 13 within within IN 10_1101-436634 338 14 the the DT 10_1101-436634 338 15 manuscript manuscript NN 10_1101-436634 338 16 and and CC 10_1101-436634 338 17 lifted lift VBD 10_1101-436634 338 18 over over RP 10_1101-436634 338 19 to to IN 10_1101-436634 338 20 GRCh38 GRCh38 NNP 10_1101-436634 338 21 . . . 10_1101-436634 339 1 Visual visual JJ 10_1101-436634 339 2 exploration exploration NN 10_1101-436634 339 3 of of IN 10_1101-436634 339 4 statistically statistically RB 10_1101-436634 339 5 significant significant JJ 10_1101-436634 339 6 candidate candidate NN 10_1101-436634 339 7 events event NNS 10_1101-436634 339 8 IGV IGV NNP 10_1101-436634 339 9 sessions session NNS 10_1101-436634 339 10 were be VBD 10_1101-436634 339 11 created create VBN 10_1101-436634 339 12 for for IN 10_1101-436634 339 13 each each DT 10_1101-436634 339 14 event event NN 10_1101-436634 339 15 identified identify VBN 10_1101-436634 339 16 by by IN 10_1101-436634 339 17 RegTools regtool NNS 10_1101-436634 339 18 that that WDT 10_1101-436634 339 19 was be VBD 10_1101-436634 339 20 statistically statistically RB 10_1101-436634 339 21 significant significant JJ 10_1101-436634 339 22 . . . 10_1101-436634 340 1 Each each DT 10_1101-436634 340 2 IGV igv NN 10_1101-436634 340 3 session session NN 10_1101-436634 340 4 file file NN 10_1101-436634 340 5 contained contain VBD 10_1101-436634 340 6 a a DT 10_1101-436634 340 7 bed bed NN 10_1101-436634 340 8 file file NN 10_1101-436634 340 9 with with IN 10_1101-436634 340 10 the the DT 10_1101-436634 340 11 junction junction NN 10_1101-436634 340 12 , , , 10_1101-436634 340 13 a a DT 10_1101-436634 340 14 vcf vcf NN 10_1101-436634 340 15 file file NN 10_1101-436634 340 16 with with IN 10_1101-436634 340 17 the the DT 10_1101-436634 340 18 variant variant NN 10_1101-436634 340 19 , , , 10_1101-436634 340 20 and and CC 10_1101-436634 340 21 an an DT 10_1101-436634 340 22 alignment alignment NN 10_1101-436634 340 23 file file NN 10_1101-436634 340 24 for for IN 10_1101-436634 340 25 each each DT 10_1101-436634 340 26 sample sample NN 10_1101-436634 340 27 that that WDT 10_1101-436634 340 28 contained contain VBD 10_1101-436634 340 29 the the DT 10_1101-436634 340 30 variant variant NN 10_1101-436634 340 31 . . . 10_1101-436634 341 1 Additional additional JJ 10_1101-436634 341 2 information information NN 10_1101-436634 341 3 , , , 10_1101-436634 341 4 such such JJ 10_1101-436634 341 5 as as IN 10_1101-436634 341 6 the the DT 10_1101-436634 341 7 splice splice NN 10_1101-436634 341 8 sites site NNS 10_1101-436634 341 9 predicted predict VBN 10_1101-436634 341 10 by by IN 10_1101-436634 341 11 SpliceAI spliceai CD 10_1101-436634 341 12 , , , 10_1101-436634 341 13 were be VBD 10_1101-436634 341 14 also also RB 10_1101-436634 341 15 added add VBN 10_1101-436634 341 16 to to IN 10_1101-436634 341 17 these these DT 10_1101-436634 341 18 session session NN 10_1101-436634 341 19 files file NNS 10_1101-436634 341 20 to to TO 10_1101-436634 341 21 enhance enhance VB 10_1101-436634 341 22 the the DT 10_1101-436634 341 23 exploration exploration NN 10_1101-436634 341 24 of of IN 10_1101-436634 341 25 these these DT 10_1101-436634 341 26 events event NNS 10_1101-436634 341 27 . . . 10_1101-436634 342 1 Events event NNS 10_1101-436634 342 2 of of IN 10_1101-436634 342 3 interest interest NN 10_1101-436634 342 4 were be VBD 10_1101-436634 342 5 manually manually RB 10_1101-436634 342 6 reviewed review VBN 10_1101-436634 342 7 in in IN 10_1101-436634 342 8 IGV IGV NNP 10_1101-436634 342 9 to to TO 10_1101-436634 342 10 assess assess VB 10_1101-436634 342 11 whether whether IN 10_1101-436634 342 12 the the DT 10_1101-436634 342 13 association association NN 10_1101-436634 342 14 between between IN 10_1101-436634 342 15 the the DT 10_1101-436634 342 16 variant variant JJ 10_1101-436634 342 17 and and CC 10_1101-436634 342 18 junction junction NN 10_1101-436634 342 19 made make VBD 10_1101-436634 342 20 sense sense NN 10_1101-436634 342 21 in in IN 10_1101-436634 342 22 a a DT 10_1101-436634 342 23 biological biological JJ 10_1101-436634 342 24 context context NN 10_1101-436634 342 25 ( ( -LRB- 10_1101-436634 342 26 e.g. e.g. RB 10_1101-436634 343 1 affected affect VBD 10_1101-436634 343 2 a a DT 10_1101-436634 343 3 known know VBN 10_1101-436634 343 4 splice splice NN 10_1101-436634 343 5 site site NN 10_1101-436634 343 6 , , , 10_1101-436634 343 7 altered alter VBD 10_1101-436634 343 8 a a DT 10_1101-436634 343 9 genomic genomic JJ 10_1101-436634 343 10 sequence sequence NN 10_1101-436634 343 11 to to TO 10_1101-436634 343 12 look look VB 10_1101-436634 343 13 more more JJR 10_1101-436634 343 14 like like IN 10_1101-436634 343 15 a a DT 10_1101-436634 343 16 canonical canonical JJ 10_1101-436634 343 17 splice splice NN 10_1101-436634 343 18 site site NN 10_1101-436634 343 19 , , , 10_1101-436634 343 20 or or CC 10_1101-436634 343 21 the the DT 10_1101-436634 343 22 novel novel JJ 10_1101-436634 343 23 junction junction NN 10_1101-436634 343 24 disrupted disrupt VBD 10_1101-436634 343 25 active active JJ 10_1101-436634 343 26 or or CC 10_1101-436634 343 27 regulatory regulatory JJ 10_1101-436634 343 28 domains domain NNS 10_1101-436634 343 29 of of IN 10_1101-436634 343 30 the the DT 10_1101-436634 343 31 protein protein NN 10_1101-436634 343 32 product product NN 10_1101-436634 343 33 ) ) -RRB- 10_1101-436634 343 34 . . . 10_1101-436634 344 1 An an DT 10_1101-436634 344 2 extensive extensive JJ 10_1101-436634 344 3 review review NN 10_1101-436634 344 4 of of IN 10_1101-436634 344 5 literature literature NN 10_1101-436634 344 6 and and CC 10_1101-436634 344 7 visualizations visualization NNS 10_1101-436634 344 8 of of IN 10_1101-436634 344 9 junction junction NN 10_1101-436634 344 10 usage usage NN 10_1101-436634 344 11 in in IN 10_1101-436634 344 12 the the DT 10_1101-436634 344 13 presence presence NN 10_1101-436634 344 14 and and CC 10_1101-436634 344 15 absence absence NN 10_1101-436634 344 16 of of IN 10_1101-436634 344 17 the the DT 10_1101-436634 344 18 variant variant NN 10_1101-436634 344 19 were be VBD 10_1101-436634 344 20 also also RB 10_1101-436634 344 21 used use VBN 10_1101-436634 344 22 to to TO 10_1101-436634 344 23 identify identify VB 10_1101-436634 344 24 novel novel NN 10_1101-436634 344 25 , , , 10_1101-436634 344 26 biologically biologically RB 10_1101-436634 344 27 relevant relevant JJ 10_1101-436634 344 28 events event NNS 10_1101-436634 344 29 . . . 10_1101-436634 345 1 Identification identification NN 10_1101-436634 345 2 of of IN 10_1101-436634 345 3 genes gene NNS 10_1101-436634 345 4 with with IN 10_1101-436634 345 5 recurrent recurrent JJ 10_1101-436634 345 6 splice splice NN 10_1101-436634 345 7 altering alter VBG 10_1101-436634 345 8 variants variant NNS 10_1101-436634 345 9 For for IN 10_1101-436634 345 10 each each DT 10_1101-436634 345 11 cohort cohort NN 10_1101-436634 345 12 , , , 10_1101-436634 345 13 we -PRON- PRP 10_1101-436634 345 14 calculated calculate VBD 10_1101-436634 345 15 a a DT 10_1101-436634 345 16 p p NN 10_1101-436634 345 17 - - HYPH 10_1101-436634 345 18 value value NN 10_1101-436634 345 19 to to TO 10_1101-436634 345 20 assess assess VB 10_1101-436634 345 21 whether whether IN 10_1101-436634 345 22 the the DT 10_1101-436634 345 23 splicing splicing NN 10_1101-436634 345 24 profile profile NN 10_1101-436634 345 25 from from IN 10_1101-436634 345 26 a a DT 10_1101-436634 345 27 particular particular JJ 10_1101-436634 345 28 gene gene NN 10_1101-436634 345 29 was be VBD 10_1101-436634 345 30 significantly significantly RB 10_1101-436634 345 31 more more RBR 10_1101-436634 345 32 likely likely JJ 10_1101-436634 345 33 to to TO 10_1101-436634 345 34 be be VB 10_1101-436634 345 35 altered alter VBN 10_1101-436634 345 36 by by IN 10_1101-436634 345 37 somatic somatic JJ 10_1101-436634 345 38 variants variant NNS 10_1101-436634 345 39 . . . 10_1101-436634 346 1 Specifically specifically RB 10_1101-436634 346 2 , , , 10_1101-436634 346 3 we -PRON- PRP 10_1101-436634 346 4 performed perform VBD 10_1101-436634 346 5 a a DT 10_1101-436634 346 6 1-tailed 1-tailed CD 10_1101-436634 346 7 binomial binomial JJ 10_1101-436634 346 8 test test NN 10_1101-436634 346 9 , , , 10_1101-436634 346 10 considering consider VBG 10_1101-436634 346 11 the the DT 10_1101-436634 346 12 number number NN 10_1101-436634 346 13 of of IN 10_1101-436634 346 14 samples sample NNS 10_1101-436634 346 15 in in IN 10_1101-436634 346 16 a a DT 10_1101-436634 346 17 cohort cohort NN 10_1101-436634 346 18 as as IN 10_1101-436634 346 19 the the DT 10_1101-436634 346 20 number number NN 10_1101-436634 346 21 of of IN 10_1101-436634 346 22 attempts attempt NNS 10_1101-436634 346 23 . . . 10_1101-436634 347 1 Success success NN 10_1101-436634 347 2 was be VBD 10_1101-436634 347 3 defined define VBN 10_1101-436634 347 4 by by IN 10_1101-436634 347 5 whether whether IN 10_1101-436634 347 6 the the DT 10_1101-436634 347 7 sample sample NN 10_1101-436634 347 8 had have VBD 10_1101-436634 347 9 evidence evidence NN 10_1101-436634 347 10 of of IN 10_1101-436634 347 11 at at RB 10_1101-436634 347 12 least least RBS 10_1101-436634 347 13 one one CD 10_1101-436634 347 14 splice splice NN 10_1101-436634 347 15 - - HYPH 10_1101-436634 347 16 altering alter VBG 10_1101-436634 347 17 variant variant NN 10_1101-436634 347 18 in in IN 10_1101-436634 347 19 that that DT 10_1101-436634 347 20 gene gene NN 10_1101-436634 347 21 . . . 10_1101-436634 348 1 The the DT 10_1101-436634 348 2 null null JJ 10_1101-436634 348 3 probability probability NN 10_1101-436634 348 4 of of IN 10_1101-436634 348 5 success success NN 10_1101-436634 348 6 , , , 10_1101-436634 348 7 pnull pnull JJ 10_1101-436634 348 8 was be VBD 10_1101-436634 348 9 calculated calculate VBN 10_1101-436634 348 10 as as IN 10_1101-436634 348 11 where where WRB 10_1101-436634 348 12 s s POS 10_1101-436634 348 13 is be VBZ 10_1101-436634 348 14 the the DT 10_1101-436634 348 15 total total JJ 10_1101-436634 348 16 number number NN 10_1101-436634 348 17 of of IN 10_1101-436634 348 18 base base NN 10_1101-436634 348 19 positions position NNS 10_1101-436634 348 20 residing reside VBG 10_1101-436634 348 21 in in IN 10_1101-436634 348 22 any any DT 10_1101-436634 348 23 of of IN 10_1101-436634 348 24 the the DT 10_1101-436634 348 25 gene gene NN 10_1101-436634 348 26 ’s ’s POS 10_1101-436634 348 27 splice splice NN 10_1101-436634 348 28 variant variant JJ 10_1101-436634 348 29 windows window NNS 10_1101-436634 348 30 , , , 10_1101-436634 348 31 V V NNP 10_1101-436634 348 32 is be VBZ 10_1101-436634 348 33 the the DT 10_1101-436634 348 34 event event NN 10_1101-436634 348 35 that that WDT 10_1101-436634 348 36 a a DT 10_1101-436634 348 37 somatic somatic JJ 10_1101-436634 348 38 variant variant NN 10_1101-436634 348 39 occurred occur VBD 10_1101-436634 348 40 at at IN 10_1101-436634 348 41 such such PDT 10_1101-436634 348 42 a a DT 10_1101-436634 348 43 base base NN 10_1101-436634 348 44 position position NN 10_1101-436634 348 45 , , , 10_1101-436634 348 46 and and CC 10_1101-436634 348 47 A a DT 10_1101-436634 348 48 is be VBZ 10_1101-436634 348 49 the the DT 10_1101-436634 348 50 event event NN 10_1101-436634 348 51 that that IN 10_1101-436634 348 52 this this DT 10_1101-436634 348 53 variant variant NN 10_1101-436634 348 54 was be VBD 10_1101-436634 348 55 deemed deem VBN 10_1101-436634 348 56 to to TO 10_1101-436634 348 57 be be VB 10_1101-436634 348 58 significantly significantly RB 10_1101-436634 348 59 associated associate VBN 10_1101-436634 348 60 with with IN 10_1101-436634 348 61 at at RB 10_1101-436634 348 62 least least RBS 10_1101-436634 348 63 one one CD 10_1101-436634 348 64 junction junction NN 10_1101-436634 348 65 in in IN 10_1101-436634 348 66 our -PRON- PRP$ 10_1101-436634 348 67 analysis analysis NN 10_1101-436634 348 68 . . . 10_1101-436634 349 1 The the DT 10_1101-436634 349 2 joint joint JJ 10_1101-436634 349 3 probability probability NN 10_1101-436634 349 4 that that IN 10_1101-436634 349 5 both both CC 10_1101-436634 349 6 V V NNP 10_1101-436634 349 7 and and CC 10_1101-436634 349 8 A a DT 10_1101-436634 349 9 occurred occur VBN 10_1101-436634 349 10 was be VBD 10_1101-436634 349 11 estimated estimate VBN 10_1101-436634 349 12 by by IN 10_1101-436634 349 13 dividing divide VBG 10_1101-436634 349 14 the the DT 10_1101-436634 349 15 total total NN 10_1101-436634 349 16 of of IN 10_1101-436634 349 17 events event NNS 10_1101-436634 349 18 across across IN 10_1101-436634 349 19 all all DT 10_1101-436634 349 20 samples sample NNS 10_1101-436634 349 21 in in IN 10_1101-436634 349 22 which which WDT 10_1101-436634 349 23 each each DT 10_1101-436634 349 24 junction junction NN 10_1101-436634 349 25 was be VBD 10_1101-436634 349 26 detected detect VBN 10_1101-436634 349 27 by by IN 10_1101-436634 349 28 s. s. NN 10_1101-436634 349 29 The the DT 10_1101-436634 349 30 value value NN 10_1101-436634 349 31 of of IN 10_1101-436634 349 32 s s POS 10_1101-436634 349 33 was be VBD 10_1101-436634 349 34 computed compute VBN 10_1101-436634 349 35 based base VBN 10_1101-436634 349 36 on on IN 10_1101-436634 349 37 the the DT 10_1101-436634 349 38 exon exon NN 10_1101-436634 349 39 and and CC 10_1101-436634 349 40 transcript transcript JJ 10_1101-436634 349 41 definitions definition NNS 10_1101-436634 349 42 in in IN 10_1101-436634 349 43 the the DT 10_1101-436634 349 44 reference reference NN 10_1101-436634 349 45 GTF gtf NN 10_1101-436634 349 46 used use VBN 10_1101-436634 349 47 for for IN 10_1101-436634 349 48 performing perform VBG 10_1101-436634 349 49 RegTools RegTools NNP 10_1101-436634 349 50 analyses analysis NNS 10_1101-436634 349 51 on on IN 10_1101-436634 349 52 a a DT 10_1101-436634 349 53 given give VBN 10_1101-436634 349 54 cohort cohort NN 10_1101-436634 349 55 . . . 10_1101-436634 350 1 .CC .CC NFP 10_1101-436634 350 2 - - : 10_1101-436634 350 3 BY by IN 10_1101-436634 350 4 - - HYPH 10_1101-436634 350 5 NC NC NNP 10_1101-436634 350 6 - - HYPH 10_1101-436634 350 7 ND ND NNP 10_1101-436634 350 8 4.0 4.0 CD 10_1101-436634 350 9 International international JJ 10_1101-436634 350 10 licensea licensea NNS 10_1101-436634 350 11 certified certify VBN 10_1101-436634 350 12 by by IN 10_1101-436634 350 13 peer peer NN 10_1101-436634 350 14 review review NN 10_1101-436634 350 15 ) ) -RRB- 10_1101-436634 350 16 is be VBZ 10_1101-436634 350 17 the the DT 10_1101-436634 350 18 author author NN 10_1101-436634 350 19 / / SYM 10_1101-436634 350 20 funder funder NN 10_1101-436634 350 21 , , , 10_1101-436634 350 22 who who WP 10_1101-436634 350 23 has have VBZ 10_1101-436634 350 24 granted grant VBN 10_1101-436634 350 25 bioRxiv biorxiv IN 10_1101-436634 350 26 a a DT 10_1101-436634 350 27 license license NN 10_1101-436634 350 28 to to TO 10_1101-436634 350 29 display display VB 10_1101-436634 350 30 the the DT 10_1101-436634 350 31 preprint preprint NN 10_1101-436634 350 32 in in IN 10_1101-436634 350 33 perpetuity perpetuity NN 10_1101-436634 350 34 . . . 10_1101-436634 351 1 It -PRON- PRP 10_1101-436634 351 2 is be VBZ 10_1101-436634 351 3 made make VBN 10_1101-436634 351 4 available available JJ 10_1101-436634 351 5 under under IN 10_1101-436634 351 6 The the DT 10_1101-436634 351 7 copyright copyright NN 10_1101-436634 351 8 holder holder NN 10_1101-436634 351 9 for for IN 10_1101-436634 351 10 this this DT 10_1101-436634 351 11 preprint preprint NN 10_1101-436634 351 12 ( ( -LRB- 10_1101-436634 351 13 which which WDT 10_1101-436634 351 14 was be VBD 10_1101-436634 351 15 notthis notthis DT 10_1101-436634 351 16 version version NN 10_1101-436634 351 17 posted post VBN 10_1101-436634 351 18 January January NNP 10_1101-436634 351 19 5 5 CD 10_1101-436634 351 20 , , , 10_1101-436634 351 21 2021 2021 CD 10_1101-436634 351 22 . . . 10_1101-436634 351 23 ; ; : 10_1101-436634 351 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 351 25 : : : 10_1101-436634 351 26 bioRxiv biorxiv VB 10_1101-436634 351 27 preprint preprint NN 10_1101-436634 351 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 351 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 351 30 17 17 CD 10_1101-436634 351 31 We -PRON- PRP 10_1101-436634 351 32 also also RB 10_1101-436634 351 33 calculated calculate VBD 10_1101-436634 351 34 overall overall JJ 10_1101-436634 351 35 metrics metric NNS 10_1101-436634 351 36 , , , 10_1101-436634 351 37 in in IN 10_1101-436634 351 38 order order NN 10_1101-436634 351 39 to to TO 10_1101-436634 351 40 rank rank VB 10_1101-436634 351 41 genes gene NNS 10_1101-436634 351 42 . . . 10_1101-436634 352 1 For for IN 10_1101-436634 352 2 each each DT 10_1101-436634 352 3 set set NN 10_1101-436634 352 4 of of IN 10_1101-436634 352 5 cohorts cohort NNS 10_1101-436634 352 6 ( ( -LRB- 10_1101-436634 352 7 e.g. e.g. RB 10_1101-436634 353 1 TCGA- TCGA- NNP 10_1101-436634 353 2 only only RB 10_1101-436634 353 3 , , , 10_1101-436634 353 4 MGI MGI NNP 10_1101-436634 353 5 - - HYPH 10_1101-436634 353 6 only only RB 10_1101-436634 353 7 , , , 10_1101-436634 353 8 combined combine VBN 10_1101-436634 353 9 ) ) -RRB- 10_1101-436634 353 10 , , , 10_1101-436634 353 11 an an DT 10_1101-436634 353 12 overall overall JJ 10_1101-436634 353 13 p p NN 10_1101-436634 353 14 - - HYPH 10_1101-436634 353 15 value value NN 10_1101-436634 353 16 was be VBD 10_1101-436634 353 17 computed compute VBN 10_1101-436634 353 18 for for IN 10_1101-436634 353 19 each each DT 10_1101-436634 353 20 gene gene NN 10_1101-436634 353 21 according accord VBG 10_1101-436634 353 22 to to IN 10_1101-436634 353 23 the the DT 10_1101-436634 353 24 above above JJ 10_1101-436634 353 25 formula formula NN 10_1101-436634 353 26 , , , 10_1101-436634 353 27 pooling pool VBG 10_1101-436634 353 28 all all DT 10_1101-436634 353 29 of of IN 10_1101-436634 353 30 the the DT 10_1101-436634 353 31 samples sample NNS 10_1101-436634 353 32 across across IN 10_1101-436634 353 33 the the DT 10_1101-436634 353 34 included include VBN 10_1101-436634 353 35 cohorts cohort NNS 10_1101-436634 353 36 , , , 10_1101-436634 353 37 and and CC 10_1101-436634 353 38 the the DT 10_1101-436634 353 39 fraction fraction NN 10_1101-436634 353 40 of of IN 10_1101-436634 353 41 samples sample NNS 10_1101-436634 353 42 was be VBD 10_1101-436634 353 43 simply simply RB 10_1101-436634 353 44 calculated calculate VBN 10_1101-436634 353 45 by by IN 10_1101-436634 353 46 dividing divide VBG 10_1101-436634 353 47 the the DT 10_1101-436634 353 48 number number NN 10_1101-436634 353 49 of of IN 10_1101-436634 353 50 samples sample NNS 10_1101-436634 353 51 in in IN 10_1101-436634 353 52 which which WDT 10_1101-436634 353 53 an an DT 10_1101-436634 353 54 event event NN 10_1101-436634 353 55 occurred occur VBD 10_1101-436634 353 56 within within IN 10_1101-436634 353 57 the the DT 10_1101-436634 353 58 given give VBN 10_1101-436634 353 59 gene gene NN 10_1101-436634 353 60 by by IN 10_1101-436634 353 61 the the DT 10_1101-436634 353 62 total total JJ 10_1101-436634 353 63 number number NN 10_1101-436634 353 64 of of IN 10_1101-436634 353 65 samples sample NNS 10_1101-436634 353 66 , , , 10_1101-436634 353 67 pooled pool VBN 10_1101-436634 353 68 across across IN 10_1101-436634 353 69 the the DT 10_1101-436634 353 70 included included JJ 10_1101-436634 353 71 cohorts cohort NNS 10_1101-436634 353 72 . . . 10_1101-436634 354 1 The the DT 10_1101-436634 354 2 reference reference NN 10_1101-436634 354 3 GTF gtf NN 10_1101-436634 354 4 used use VBN 10_1101-436634 354 5 for for IN 10_1101-436634 354 6 analyzing analyze VBG 10_1101-436634 354 7 the the DT 10_1101-436634 354 8 TCGA TCGA NNP 10_1101-436634 354 9 samples sample NNS 10_1101-436634 354 10 ( ( -LRB- 10_1101-436634 354 11 i.e. i.e. FW 10_1101-436634 355 1 gencode.v29.annotation.gtf gencode.v29.annotation.gtf NNP 10_1101-436634 355 2 ) ) -RRB- 10_1101-436634 355 3 was be VBD 10_1101-436634 355 4 used use VBN 10_1101-436634 355 5 for for IN 10_1101-436634 355 6 all all DT 10_1101-436634 355 7 sets set NNS 10_1101-436634 355 8 of of IN 10_1101-436634 355 9 cohorts cohort NNS 10_1101-436634 355 10 . . . 10_1101-436634 356 1 Code code NN 10_1101-436634 356 2 availability availability NN 10_1101-436634 356 3 RegTools RegTools NNP 10_1101-436634 356 4 is be VBZ 10_1101-436634 356 5 open open JJ 10_1101-436634 356 6 source source NN 10_1101-436634 356 7 ( ( -LRB- 10_1101-436634 356 8 MIT MIT NNP 10_1101-436634 356 9 license license NN 10_1101-436634 356 10 ) ) -RRB- 10_1101-436634 356 11 and and CC 10_1101-436634 356 12 available available JJ 10_1101-436634 356 13 at at IN 10_1101-436634 356 14 https://github.com/griffithlab/regtools/. https://github.com/griffithlab/regtools/. NNP 10_1101-436634 357 1 All all DT 10_1101-436634 357 2 scripts script NNS 10_1101-436634 357 3 used use VBN 10_1101-436634 357 4 in in IN 10_1101-436634 357 5 the the DT 10_1101-436634 357 6 analyses analysis NNS 10_1101-436634 357 7 presented present VBN 10_1101-436634 357 8 here here RB 10_1101-436634 357 9 are be VBP 10_1101-436634 357 10 also also RB 10_1101-436634 357 11 provided provide VBN 10_1101-436634 357 12 . . . 10_1101-436634 358 1 For for IN 10_1101-436634 358 2 ease ease NN 10_1101-436634 358 3 of of IN 10_1101-436634 358 4 use use NN 10_1101-436634 358 5 , , , 10_1101-436634 358 6 a a DT 10_1101-436634 358 7 Docker Docker NNP 10_1101-436634 358 8 container container NN 10_1101-436634 358 9 has have VBZ 10_1101-436634 358 10 been be VBN 10_1101-436634 358 11 created create VBN 10_1101-436634 358 12 with with IN 10_1101-436634 358 13 RegTools RegTools NNP 10_1101-436634 358 14 , , , 10_1101-436634 358 15 R r NN 10_1101-436634 358 16 , , , 10_1101-436634 358 17 and and CC 10_1101-436634 358 18 Python Python NNP 10_1101-436634 358 19 3 3 CD 10_1101-436634 358 20 installed instal VBN 10_1101-436634 358 21 ( ( -LRB- 10_1101-436634 358 22 https://hub.docker.com/r/griffithlab/regtools/ https://hub.docker.com/r/griffithlab/regtools/ NNP 10_1101-436634 358 23 ) ) -RRB- 10_1101-436634 358 24 . . . 10_1101-436634 359 1 This this DT 10_1101-436634 359 2 Docker Docker NNP 10_1101-436634 359 3 container container NN 10_1101-436634 359 4 allows allow VBZ 10_1101-436634 359 5 a a DT 10_1101-436634 359 6 user user NN 10_1101-436634 359 7 to to TO 10_1101-436634 359 8 run run VB 10_1101-436634 359 9 the the DT 10_1101-436634 359 10 workflow workflow NN 10_1101-436634 359 11 we -PRON- PRP 10_1101-436634 359 12 outline outline VBP 10_1101-436634 359 13 at at IN 10_1101-436634 359 14 https://regtools.readthedocs.io/en/latest/workflow/. https://regtools.readthedocs.io/en/latest/workflow/. NNP 10_1101-436634 360 1 Docker Docker NNP 10_1101-436634 360 2 is be VBZ 10_1101-436634 360 3 an an DT 10_1101-436634 360 4 open- open- JJ 10_1101-436634 360 5 source source NN 10_1101-436634 360 6 software software NN 10_1101-436634 360 7 platform platform NN 10_1101-436634 360 8 that that WDT 10_1101-436634 360 9 enables enable VBZ 10_1101-436634 360 10 applications application NNS 10_1101-436634 360 11 to to TO 10_1101-436634 360 12 be be VB 10_1101-436634 360 13 readily readily RB 10_1101-436634 360 14 installed instal VBN 10_1101-436634 360 15 and and CC 10_1101-436634 360 16 run run VBN 10_1101-436634 360 17 on on IN 10_1101-436634 360 18 any any DT 10_1101-436634 360 19 system system NN 10_1101-436634 360 20 . . . 10_1101-436634 361 1 The the DT 10_1101-436634 361 2 availability availability NN 10_1101-436634 361 3 of of IN 10_1101-436634 361 4 RegTools RegTools NNP 10_1101-436634 361 5 with with IN 10_1101-436634 361 6 all all PDT 10_1101-436634 361 7 its -PRON- PRP$ 10_1101-436634 361 8 dependencies dependency NNS 10_1101-436634 361 9 as as IN 10_1101-436634 361 10 a a DT 10_1101-436634 361 11 Docker Docker NNP 10_1101-436634 361 12 container container NN 10_1101-436634 361 13 also also RB 10_1101-436634 361 14 facilitates facilitate VBZ 10_1101-436634 361 15 the the DT 10_1101-436634 361 16 integration integration NN 10_1101-436634 361 17 of of IN 10_1101-436634 361 18 the the DT 10_1101-436634 361 19 RegTools RegTools NNP 10_1101-436634 361 20 software software NN 10_1101-436634 361 21 into into IN 10_1101-436634 361 22 workflow workflow NN 10_1101-436634 361 23 pipelines pipeline NNS 10_1101-436634 361 24 that that WDT 10_1101-436634 361 25 support support VBP 10_1101-436634 361 26 Docker Docker NNP 10_1101-436634 361 27 images image NNS 10_1101-436634 361 28 . . . 10_1101-436634 362 1 Data datum NNS 10_1101-436634 362 2 availability availability NN 10_1101-436634 362 3 Sequence Sequence NNP 10_1101-436634 362 4 data datum NNS 10_1101-436634 362 5 for for IN 10_1101-436634 362 6 each each DT 10_1101-436634 362 7 cohort cohort NN 10_1101-436634 362 8 analyzed analyze VBN 10_1101-436634 362 9 in in IN 10_1101-436634 362 10 this this DT 10_1101-436634 362 11 study study NN 10_1101-436634 362 12 are be VBP 10_1101-436634 362 13 available available JJ 10_1101-436634 362 14 through through IN 10_1101-436634 362 15 dbGaP dbgap NN 10_1101-436634 362 16 at at IN 10_1101-436634 362 17 the the DT 10_1101-436634 362 18 following follow VBG 10_1101-436634 362 19 accession accession NN 10_1101-436634 362 20 IDs id NNS 10_1101-436634 362 21 : : : 10_1101-436634 362 22 phs000178 phs000178 NN 10_1101-436634 362 23 for for IN 10_1101-436634 362 24 TCGA TCGA NNP 10_1101-436634 362 25 cohorts cohort NNS 10_1101-436634 362 26 , , , 10_1101-436634 362 27 phs001106 phs001106 NNP 10_1101-436634 362 28 for for IN 10_1101-436634 362 29 HCC HCC NNP 10_1101-436634 362 30 , , , 10_1101-436634 362 31 phs001049 phs001049 NNP 10_1101-436634 362 32 for for IN 10_1101-436634 362 33 SCLC SCLC NNP 10_1101-436634 362 34 , , , 10_1101-436634 362 35 and and CC 10_1101-436634 362 36 phs001623 phs001623 NNP 10_1101-436634 362 37 for for IN 10_1101-436634 362 38 OSCC OSCC NNP 10_1101-436634 362 39 . . . 10_1101-436634 363 1 Statistically statistically RB 10_1101-436634 363 2 significant significant JJ 10_1101-436634 363 3 events event NNS 10_1101-436634 363 4 for for IN 10_1101-436634 363 5 D d NN 10_1101-436634 363 6 , , , 10_1101-436634 363 7 A a NN 10_1101-436634 363 8 , , , 10_1101-436634 363 9 and and CC 10_1101-436634 363 10 NDA NDA NNP 10_1101-436634 363 11 junctions junction NNS 10_1101-436634 363 12 across across IN 10_1101-436634 363 13 the the DT 10_1101-436634 363 14 four four CD 10_1101-436634 363 15 variant variant JJ 10_1101-436634 363 16 splicing splicing NN 10_1101-436634 363 17 windows window NNS 10_1101-436634 363 18 used use VBN 10_1101-436634 363 19 are be VBP 10_1101-436634 363 20 available available JJ 10_1101-436634 363 21 via via IN 10_1101-436634 363 22 Supplemental Supplemental NNP 10_1101-436634 363 23 Files Files NNP 10_1101-436634 363 24 1 1 CD 10_1101-436634 363 25 and and CC 10_1101-436634 363 26 2 2 CD 10_1101-436634 363 27 . . . 10_1101-436634 364 1 Statistically statistically RB 10_1101-436634 364 2 significant significant JJ 10_1101-436634 364 3 events event NNS 10_1101-436634 364 4 for for IN 10_1101-436634 364 5 DA DA NNP 10_1101-436634 364 6 junctions junction NNS 10_1101-436634 364 7 are be VBP 10_1101-436634 364 8 available available JJ 10_1101-436634 364 9 as as IN 10_1101-436634 364 10 Supplemental Supplemental NNP 10_1101-436634 364 11 Files Files NNP 10_1101-436634 364 12 3 3 CD 10_1101-436634 364 13 and and CC 10_1101-436634 364 14 4 4 CD 10_1101-436634 364 15 . . . 10_1101-436634 365 1 Complete complete JJ 10_1101-436634 365 2 results result NNS 10_1101-436634 365 3 of of IN 10_1101-436634 365 4 gene gene NN 10_1101-436634 365 5 recurrence recurrence NN 10_1101-436634 365 6 analysis analysis NN 10_1101-436634 365 7 are be VBP 10_1101-436634 365 8 available available JJ 10_1101-436634 365 9 as as IN 10_1101-436634 365 10 Supplemental Supplemental NNP 10_1101-436634 365 11 Files Files NNP 10_1101-436634 365 12 5 5 CD 10_1101-436634 365 13 and and CC 10_1101-436634 365 14 6 6 CD 10_1101-436634 365 15 . . . 10_1101-436634 366 1 Acknowledgments acknowledgment NNS 10_1101-436634 366 2 We -PRON- PRP 10_1101-436634 366 3 thank thank VBP 10_1101-436634 366 4 the the DT 10_1101-436634 366 5 patients patient NNS 10_1101-436634 366 6 and and CC 10_1101-436634 366 7 their -PRON- PRP$ 10_1101-436634 366 8 families family NNS 10_1101-436634 366 9 for for IN 10_1101-436634 366 10 donation donation NN 10_1101-436634 366 11 of of IN 10_1101-436634 366 12 their -PRON- PRP$ 10_1101-436634 366 13 samples sample NNS 10_1101-436634 366 14 and and CC 10_1101-436634 366 15 participation participation NN 10_1101-436634 366 16 in in IN 10_1101-436634 366 17 clinical clinical JJ 10_1101-436634 366 18 trials trial NNS 10_1101-436634 366 19 . . . 10_1101-436634 367 1 We -PRON- PRP 10_1101-436634 367 2 would would MD 10_1101-436634 367 3 like like VB 10_1101-436634 367 4 to to TO 10_1101-436634 367 5 thank thank VB 10_1101-436634 367 6 Donald Donald NNP 10_1101-436634 367 7 Conrad Conrad NNP 10_1101-436634 367 8 for for IN 10_1101-436634 367 9 his -PRON- PRP$ 10_1101-436634 367 10 initial initial JJ 10_1101-436634 367 11 idea idea NN 10_1101-436634 367 12 to to TO 10_1101-436634 367 13 compare compare VB 10_1101-436634 367 14 to to IN 10_1101-436634 367 15 variant variant JJ 10_1101-436634 367 16 effect effect NN 10_1101-436634 367 17 predictor predictor NN 10_1101-436634 367 18 tools tool NNS 10_1101-436634 367 19 . . . 10_1101-436634 368 1 Kelsy Kelsy NNP 10_1101-436634 368 2 Cotto Cotto NNP 10_1101-436634 368 3 was be VBD 10_1101-436634 368 4 supported support VBN 10_1101-436634 368 5 by by IN 10_1101-436634 368 6 Siteman Siteman NNP 10_1101-436634 368 7 Cancer Cancer NNP 10_1101-436634 368 8 Center Center NNP 10_1101-436634 368 9 under under IN 10_1101-436634 368 10 fund fund NN 10_1101-436634 368 11 number number NN 10_1101-436634 368 12 # # $ 10_1101-436634 368 13 3477 3477 CD 10_1101-436634 368 14 - - SYM 10_1101-436634 368 15 92400 92400 CD 10_1101-436634 368 16 and and CC 10_1101-436634 368 17 T32CA113275 T32CA113275 NNP 10_1101-436634 368 18 . . . 10_1101-436634 369 1 Avinash Avinash NNP 10_1101-436634 369 2 Ramu Ramu NNP 10_1101-436634 369 3 was be VBD 10_1101-436634 369 4 supported support VBN 10_1101-436634 369 5 by by IN 10_1101-436634 369 6 the the DT 10_1101-436634 369 7 ‘ ' `` 10_1101-436634 369 8 Burroughs Burroughs NNP 10_1101-436634 369 9 Wellcome Wellcome NNP 10_1101-436634 369 10 Fund Fund NNP 10_1101-436634 369 11 Institutional Institutional NNP 10_1101-436634 369 12 Program Program NNP 10_1101-436634 369 13 Unifying Unifying NNP 10_1101-436634 369 14 Population Population NNP 10_1101-436634 369 15 and and CC 10_1101-436634 369 16 Laboratory Laboratory NNP 10_1101-436634 369 17 Based Based NNP 10_1101-436634 369 18 Sciences Sciences NNPS 10_1101-436634 369 19 Award Award NNP 10_1101-436634 369 20 ’ ' '' 10_1101-436634 369 21 at at IN 10_1101-436634 369 22 Washington Washington NNP 10_1101-436634 369 23 University University NNP 10_1101-436634 369 24 . . . 10_1101-436634 370 1 Malachi Malachi NNP 10_1101-436634 370 2 Griffith Griffith NNP 10_1101-436634 370 3 was be VBD 10_1101-436634 370 4 supported support VBN 10_1101-436634 370 5 by by IN 10_1101-436634 370 6 the the DT 10_1101-436634 370 7 National National NNP 10_1101-436634 370 8 Human Human NNP 10_1101-436634 370 9 Genome Genome NNP 10_1101-436634 370 10 Research Research NNP 10_1101-436634 370 11 Institute Institute NNP 10_1101-436634 370 12 ( ( -LRB- 10_1101-436634 370 13 NHGRI NHGRI NNP 10_1101-436634 370 14 ) ) -RRB- 10_1101-436634 370 15 of of IN 10_1101-436634 370 16 the the DT 10_1101-436634 370 17 National National NNP 10_1101-436634 370 18 Institutes Institutes NNPS 10_1101-436634 370 19 of of IN 10_1101-436634 370 20 Health Health NNP 10_1101-436634 370 21 ( ( -LRB- 10_1101-436634 370 22 NIH NIH NNP 10_1101-436634 370 23 ) ) -RRB- 10_1101-436634 370 24 under under IN 10_1101-436634 370 25 Award Award NNP 10_1101-436634 370 26 Number Number NNP 10_1101-436634 370 27 R00HG007940 R00HG007940 NNP 10_1101-436634 370 28 . . . 10_1101-436634 371 1 Malachi Malachi NNP 10_1101-436634 371 2 Griffith Griffith NNP 10_1101-436634 371 3 and and CC 10_1101-436634 371 4 Obi Obi NNP 10_1101-436634 371 5 Griffith Griffith NNP 10_1101-436634 371 6 were be VBD 10_1101-436634 371 7 supported support VBN 10_1101-436634 371 8 by by IN 10_1101-436634 371 9 the the DT 10_1101-436634 371 10 NIH NIH NNP 10_1101-436634 371 11 National National NNP 10_1101-436634 371 12 Cancer Cancer NNP 10_1101-436634 371 13 Institute Institute NNP 10_1101-436634 371 14 ( ( -LRB- 10_1101-436634 371 15 NCI NCI NNP 10_1101-436634 371 16 ) ) -RRB- 10_1101-436634 371 17 under under IN 10_1101-436634 371 18 Award Award NNP 10_1101-436634 371 19 Numbers Numbers NNPS 10_1101-436634 371 20 U01CA209936 U01CA209936 NNP 10_1101-436634 371 21 , , , 10_1101-436634 371 22 U01CA231844 U01CA231844 NNP 10_1101-436634 371 23 , , , 10_1101-436634 371 24 U01CA248235 U01CA248235 NNP 10_1101-436634 371 25 U24CA237719 U24CA237719 NNP 10_1101-436634 371 26 . . . 10_1101-436634 372 1 Malachi Malachi NNP 10_1101-436634 372 2 Griffith Griffith NNP 10_1101-436634 372 3 and and CC 10_1101-436634 372 4 Megan Megan NNP 10_1101-436634 372 5 Richters Richters NNPS 10_1101-436634 372 6 were be VBD 10_1101-436634 372 7 supported support VBN 10_1101-436634 372 8 by by IN 10_1101-436634 372 9 the the DT 10_1101-436634 372 10 V V NNP 10_1101-436634 372 11 Foundation Foundation NNP 10_1101-436634 372 12 for for IN 10_1101-436634 372 13 Cancer Cancer NNP 10_1101-436634 372 14 Research Research NNP 10_1101-436634 372 15 under under IN 10_1101-436634 372 16 Award Award NNP 10_1101-436634 372 17 Number Number NNP 10_1101-436634 372 18 V2018 V2018 NNP 10_1101-436634 372 19 - - HYPH 10_1101-436634 372 20 007 007 CD 10_1101-436634 372 21 . . . 10_1101-436634 373 1 The the DT 10_1101-436634 373 2 results result NNS 10_1101-436634 373 3 published publish VBN 10_1101-436634 373 4 here here RB 10_1101-436634 373 5 are be VBP 10_1101-436634 373 6 in in IN 10_1101-436634 373 7 whole whole JJ 10_1101-436634 373 8 or or CC 10_1101-436634 373 9 part part NN 10_1101-436634 373 10 based base VBN 10_1101-436634 373 11 upon upon IN 10_1101-436634 373 12 data datum NNS 10_1101-436634 373 13 generated generate VBN 10_1101-436634 373 14 by by IN 10_1101-436634 373 15 the the DT 10_1101-436634 373 16 TCGA TCGA NNP 10_1101-436634 373 17 Research Research NNP 10_1101-436634 373 18 Network Network NNP 10_1101-436634 373 19 : : : 10_1101-436634 373 20 https://www.cancer.gov/tcga https://www.cancer.gov/tcga ADD 10_1101-436634 373 21 . . . 10_1101-436634 374 1 Contributions Contributions NNP 10_1101-436634 374 2 K.C.C. K.C.C. NNP 10_1101-436634 375 1 and and CC 10_1101-436634 375 2 Y.-Y.F. Y.-Y.F. NNP 10_1101-436634 376 1 were be VBD 10_1101-436634 376 2 involved involve VBN 10_1101-436634 376 3 in in IN 10_1101-436634 376 4 all all DT 10_1101-436634 376 5 aspects aspect NNS 10_1101-436634 376 6 of of IN 10_1101-436634 376 7 this this DT 10_1101-436634 376 8 study study NN 10_1101-436634 376 9 , , , 10_1101-436634 376 10 including include VBG 10_1101-436634 376 11 designing design VBG 10_1101-436634 376 12 methodology methodology NN 10_1101-436634 376 13 , , , 10_1101-436634 376 14 developing develop VBG 10_1101-436634 376 15 and and CC 10_1101-436634 376 16 testing test VBG 10_1101-436634 376 17 the the DT 10_1101-436634 376 18 tool tool NN 10_1101-436634 376 19 software software NN 10_1101-436634 376 20 , , , 10_1101-436634 376 21 analyzing analyze VBG 10_1101-436634 376 22 and and CC 10_1101-436634 376 23 interpreting interpret VBG 10_1101-436634 376 24 data datum NNS 10_1101-436634 376 25 , , , 10_1101-436634 376 26 and and CC 10_1101-436634 376 27 writing write VBG 10_1101-436634 376 28 the the DT 10_1101-436634 376 29 .CC .CC NFP 10_1101-436634 376 30 - - HYPH 10_1101-436634 376 31 BY by IN 10_1101-436634 376 32 - - HYPH 10_1101-436634 376 33 NC NC NNP 10_1101-436634 376 34 - - HYPH 10_1101-436634 376 35 ND ND NNP 10_1101-436634 376 36 4.0 4.0 CD 10_1101-436634 376 37 International international JJ 10_1101-436634 376 38 licensea licensea NNS 10_1101-436634 376 39 certified certify VBN 10_1101-436634 376 40 by by IN 10_1101-436634 376 41 peer peer NN 10_1101-436634 376 42 review review NN 10_1101-436634 376 43 ) ) -RRB- 10_1101-436634 376 44 is be VBZ 10_1101-436634 376 45 the the DT 10_1101-436634 376 46 author author NN 10_1101-436634 376 47 / / SYM 10_1101-436634 376 48 funder funder NN 10_1101-436634 376 49 , , , 10_1101-436634 376 50 who who WP 10_1101-436634 376 51 has have VBZ 10_1101-436634 376 52 granted grant VBN 10_1101-436634 376 53 bioRxiv biorxiv IN 10_1101-436634 376 54 a a DT 10_1101-436634 376 55 license license NN 10_1101-436634 376 56 to to TO 10_1101-436634 376 57 display display VB 10_1101-436634 376 58 the the DT 10_1101-436634 376 59 preprint preprint NN 10_1101-436634 376 60 in in IN 10_1101-436634 376 61 perpetuity perpetuity NN 10_1101-436634 376 62 . . . 10_1101-436634 377 1 It -PRON- PRP 10_1101-436634 377 2 is be VBZ 10_1101-436634 377 3 made make VBN 10_1101-436634 377 4 available available JJ 10_1101-436634 377 5 under under IN 10_1101-436634 377 6 The the DT 10_1101-436634 377 7 copyright copyright NN 10_1101-436634 377 8 holder holder NN 10_1101-436634 377 9 for for IN 10_1101-436634 377 10 this this DT 10_1101-436634 377 11 preprint preprint NN 10_1101-436634 377 12 ( ( -LRB- 10_1101-436634 377 13 which which WDT 10_1101-436634 377 14 was be VBD 10_1101-436634 377 15 notthis notthis DT 10_1101-436634 377 16 version version NN 10_1101-436634 377 17 posted post VBN 10_1101-436634 377 18 January January NNP 10_1101-436634 377 19 5 5 CD 10_1101-436634 377 20 , , , 10_1101-436634 377 21 2021 2021 CD 10_1101-436634 377 22 . . . 10_1101-436634 377 23 ; ; : 10_1101-436634 377 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 377 25 : : : 10_1101-436634 377 26 bioRxiv biorxiv VB 10_1101-436634 377 27 preprint preprint NN 10_1101-436634 377 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 377 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 377 30 18 18 CD 10_1101-436634 377 31 manuscript manuscript NN 10_1101-436634 377 32 , , , 10_1101-436634 377 33 with with IN 10_1101-436634 377 34 input input NN 10_1101-436634 377 35 from from IN 10_1101-436634 377 36 A.R. A.R. NNP 10_1101-436634 377 37 , , , 10_1101-436634 377 38 Z.L.S. Z.L.S. NNP 10_1101-436634 377 39 , , , 10_1101-436634 377 40 M.R. M.R. NNP 10_1101-436634 377 41 , , , 10_1101-436634 377 42 S.F. S.F. NNP 10_1101-436634 377 43 , , , 10_1101-436634 377 44 J.K. J.K. NNP 10_1101-436634 377 45 , , , 10_1101-436634 377 46 O.L.G. O.L.G. NNP 10_1101-436634 377 47 , , , 10_1101-436634 377 48 and and CC 10_1101-436634 377 49 M.G. M.G. NNP 10_1101-436634 378 1 A.R. A.R. NNP 10_1101-436634 379 1 designed design VBD 10_1101-436634 379 2 the the DT 10_1101-436634 379 3 tool tool NN 10_1101-436634 379 4 and and CC 10_1101-436634 379 5 led lead VBN 10_1101-436634 379 6 software software NN 10_1101-436634 379 7 development development NN 10_1101-436634 379 8 efforts effort NNS 10_1101-436634 379 9 . . . 10_1101-436634 380 1 Y.L. Y.L. NNP 10_1101-436634 380 2 , , , 10_1101-436634 380 3 W.C.C. W.C.C. NNP 10_1101-436634 380 4 , , , 10_1101-436634 380 5 R.U. R.U. NNP 10_1101-436634 380 6 , , , 10_1101-436634 380 7 and and CC 10_1101-436634 380 8 R.G. R.G. NNP 10_1101-436634 381 1 provided provide VBD 10_1101-436634 381 2 unpublished unpublished JJ 10_1101-436634 381 3 tumor tumor NN 10_1101-436634 381 4 datasets dataset NNS 10_1101-436634 381 5 and and CC 10_1101-436634 381 6 provided provide VBD 10_1101-436634 381 7 critical critical JJ 10_1101-436634 381 8 feedback feedback NN 10_1101-436634 381 9 on on IN 10_1101-436634 381 10 the the DT 10_1101-436634 381 11 manuscript manuscript NN 10_1101-436634 381 12 . . . 10_1101-436634 382 1 O.L.G. O.L.G. NNP 10_1101-436634 383 1 and and CC 10_1101-436634 383 2 M.G. M.G. NNP 10_1101-436634 384 1 supervised supervise VBD 10_1101-436634 384 2 the the DT 10_1101-436634 384 3 study study NN 10_1101-436634 384 4 . . . 10_1101-436634 385 1 All all DT 10_1101-436634 385 2 authors author NNS 10_1101-436634 385 3 read read VBP 10_1101-436634 385 4 and and CC 10_1101-436634 385 5 approved approve VBD 10_1101-436634 385 6 the the DT 10_1101-436634 385 7 final final JJ 10_1101-436634 385 8 manuscript manuscript NN 10_1101-436634 385 9 . . . 10_1101-436634 386 1 Conflicts conflict NNS 10_1101-436634 386 2 of of IN 10_1101-436634 386 3 Interest Interest NNP 10_1101-436634 386 4 W. W. NNP 10_1101-436634 386 5 Chapman Chapman NNP 10_1101-436634 386 6 serves serve VBZ 10_1101-436634 386 7 on on IN 10_1101-436634 386 8 the the DT 10_1101-436634 386 9 advisory advisory JJ 10_1101-436634 386 10 board board NN 10_1101-436634 386 11 for for IN 10_1101-436634 386 12 Novartis Novartis NNP 10_1101-436634 386 13 Pharmaceutical Pharmaceutical NNP 10_1101-436634 386 14 and and CC 10_1101-436634 386 15 reports report VBZ 10_1101-436634 386 16 intellectual intellectual JJ 10_1101-436634 386 17 property property NN 10_1101-436634 386 18 with with IN 10_1101-436634 386 19 Pathfinder Pathfinder NNP 10_1101-436634 386 20 Therapeutics Therapeutics NNP 10_1101-436634 386 21 . . . 10_1101-436634 387 1 R. R. NNP 10_1101-436634 387 2 Uppaluri Uppaluri NNP 10_1101-436634 387 3 reports report VBZ 10_1101-436634 387 4 grants grant NNS 10_1101-436634 387 5 and and CC 10_1101-436634 387 6 personal personal JJ 10_1101-436634 387 7 fees fee NNS 10_1101-436634 387 8 from from IN 10_1101-436634 387 9 Merck Merck NNP 10_1101-436634 387 10 Inc. Inc. NNP 10_1101-436634 387 11 R. R. NNP 10_1101-436634 387 12 Govindan Govindan NNP 10_1101-436634 387 13 served serve VBD 10_1101-436634 387 14 as as IN 10_1101-436634 387 15 consultant consultant NN 10_1101-436634 387 16 for for IN 10_1101-436634 387 17 Horizon Horizon NNP 10_1101-436634 387 18 Pharmaceuticals Pharmaceuticals NNPS 10_1101-436634 387 19 and and CC 10_1101-436634 387 20 GenePlus GenePlus NNP 10_1101-436634 387 21 . . . 10_1101-436634 388 1 References reference NNS 10_1101-436634 388 2 1 1 CD 10_1101-436634 388 3 . . . 10_1101-436634 389 1 Chabot Chabot NNP 10_1101-436634 389 2 , , , 10_1101-436634 389 3 B. B. NNP 10_1101-436634 390 1 & & CC 10_1101-436634 390 2 Shkreta Shkreta NNP 10_1101-436634 390 3 , , , 10_1101-436634 390 4 L. L. NNP 10_1101-436634 390 5 Defective Defective NNP 10_1101-436634 390 6 control control NN 10_1101-436634 390 7 of of IN 10_1101-436634 390 8 pre pre JJ 10_1101-436634 390 9 - - JJ 10_1101-436634 390 10 messenger messenger JJ 10_1101-436634 390 11 RNA RNA NNP 10_1101-436634 390 12 splicing splice VBG 10_1101-436634 390 13 in in IN 10_1101-436634 390 14 human human JJ 10_1101-436634 390 15 disease disease NN 10_1101-436634 390 16 . . . 10_1101-436634 391 1 J. J. NNP 10_1101-436634 392 1 Cell Cell NNP 10_1101-436634 392 2 Biol Biol NNP 10_1101-436634 392 3 . . . 10_1101-436634 393 1 212 212 CD 10_1101-436634 393 2 , , , 10_1101-436634 393 3 13–27 13–27 CD 10_1101-436634 393 4 ( ( -LRB- 10_1101-436634 393 5 2016 2016 CD 10_1101-436634 393 6 ) ) -RRB- 10_1101-436634 393 7 . . . 10_1101-436634 394 1 2 2 LS 10_1101-436634 394 2 . . . 10_1101-436634 395 1 Vogelstein Vogelstein NNP 10_1101-436634 395 2 , , , 10_1101-436634 395 3 B. B. NNP 10_1101-436634 395 4 et et NNP 10_1101-436634 395 5 al al NNP 10_1101-436634 395 6 . . . 10_1101-436634 396 1 Cancer cancer NN 10_1101-436634 396 2 genome genome NN 10_1101-436634 396 3 landscapes landscape NNS 10_1101-436634 396 4 . . . 10_1101-436634 397 1 Science science NN 10_1101-436634 397 2 339 339 CD 10_1101-436634 397 3 , , , 10_1101-436634 397 4 1546–1558 1546–1558 CD 10_1101-436634 397 5 ( ( -LRB- 10_1101-436634 397 6 2013 2013 CD 10_1101-436634 397 7 ) ) -RRB- 10_1101-436634 397 8 . . . 10_1101-436634 398 1 3 3 LS 10_1101-436634 398 2 . . . 10_1101-436634 399 1 Soemedi Soemedi NNP 10_1101-436634 399 2 , , , 10_1101-436634 399 3 R. R. NNP 10_1101-436634 399 4 et et NNP 10_1101-436634 399 5 al al NNP 10_1101-436634 399 6 . . . 10_1101-436634 400 1 Pathogenic pathogenic JJ 10_1101-436634 400 2 variants variant NNS 10_1101-436634 400 3 that that WDT 10_1101-436634 400 4 alter alter VBP 10_1101-436634 400 5 protein protein NN 10_1101-436634 400 6 code code NN 10_1101-436634 400 7 often often RB 10_1101-436634 400 8 disrupt disrupt VBP 10_1101-436634 400 9 splicing splice VBG 10_1101-436634 400 10 . . . 10_1101-436634 401 1 Nat Nat NNP 10_1101-436634 401 2 . . . 10_1101-436634 402 1 Genet Genet NNP 10_1101-436634 402 2 . . . 10_1101-436634 403 1 49 49 CD 10_1101-436634 403 2 , , , 10_1101-436634 403 3 848–855 848–855 CD 10_1101-436634 403 4 ( ( -LRB- 10_1101-436634 403 5 2017 2017 CD 10_1101-436634 403 6 ) ) -RRB- 10_1101-436634 403 7 . . . 10_1101-436634 404 1 4 4 LS 10_1101-436634 404 2 . . . 10_1101-436634 405 1 Supek Supek NNP 10_1101-436634 405 2 , , , 10_1101-436634 405 3 F. F. NNP 10_1101-436634 405 4 , , , 10_1101-436634 405 5 Miñana Miñana NNP 10_1101-436634 405 6 , , , 10_1101-436634 405 7 B. B. NNP 10_1101-436634 405 8 , , , 10_1101-436634 405 9 Valcárcel Valcárcel NNP 10_1101-436634 405 10 , , , 10_1101-436634 405 11 J. J. NNP 10_1101-436634 405 12 , , , 10_1101-436634 405 13 Gabaldón Gabaldón NNP 10_1101-436634 405 14 , , , 10_1101-436634 405 15 T. T. NNP 10_1101-436634 405 16 & & CC 10_1101-436634 405 17 Lehner Lehner NNP 10_1101-436634 405 18 , , , 10_1101-436634 405 19 B. B. NNP 10_1101-436634 405 20 Synonymous synonymous JJ 10_1101-436634 405 21 mutations mutation NNS 10_1101-436634 405 22 frequently frequently RB 10_1101-436634 405 23 act act VBP 10_1101-436634 405 24 as as IN 10_1101-436634 405 25 driver driver NN 10_1101-436634 405 26 mutations mutation NNS 10_1101-436634 405 27 in in IN 10_1101-436634 405 28 human human JJ 10_1101-436634 405 29 cancers cancer NNS 10_1101-436634 405 30 . . . 10_1101-436634 406 1 Cell Cell NNP 10_1101-436634 406 2 156 156 CD 10_1101-436634 406 3 , , , 10_1101-436634 406 4 1324–1335 1324–1335 CD 10_1101-436634 406 5 ( ( -LRB- 10_1101-436634 406 6 2014 2014 CD 10_1101-436634 406 7 ) ) -RRB- 10_1101-436634 406 8 . . . 10_1101-436634 407 1 5 5 CD 10_1101-436634 407 2 . . . 10_1101-436634 408 1 Jung Jung NNP 10_1101-436634 408 2 , , , 10_1101-436634 408 3 H. H. NNP 10_1101-436634 408 4 et et NNP 10_1101-436634 408 5 al al NNP 10_1101-436634 408 6 . . . 10_1101-436634 409 1 Intron Intron NNP 10_1101-436634 409 2 retention retention NN 10_1101-436634 409 3 is be VBZ 10_1101-436634 409 4 a a DT 10_1101-436634 409 5 widespread widespread JJ 10_1101-436634 409 6 mechanism mechanism NN 10_1101-436634 409 7 of of IN 10_1101-436634 409 8 tumor tumor NN 10_1101-436634 409 9 - - HYPH 10_1101-436634 409 10 suppressor suppressor NN 10_1101-436634 409 11 inactivation inactivation NN 10_1101-436634 409 12 . . . 10_1101-436634 410 1 Nat Nat NNP 10_1101-436634 410 2 . . . 10_1101-436634 411 1 Genet Genet NNP 10_1101-436634 411 2 . . . 10_1101-436634 412 1 47 47 CD 10_1101-436634 412 2 , , , 10_1101-436634 412 3 1242–1248 1242–1248 CD 10_1101-436634 412 4 ( ( -LRB- 10_1101-436634 412 5 2015 2015 CD 10_1101-436634 412 6 ) ) -RRB- 10_1101-436634 412 7 . . . 10_1101-436634 413 1 6 6 CD 10_1101-436634 413 2 . . . 10_1101-436634 414 1 Venables Venables NNP 10_1101-436634 414 2 , , , 10_1101-436634 414 3 J. J. NNP 10_1101-436634 414 4 P. P. NNP 10_1101-436634 414 5 Aberrant Aberrant NNP 10_1101-436634 414 6 and and CC 10_1101-436634 414 7 alternative alternative JJ 10_1101-436634 414 8 splicing splicing NN 10_1101-436634 414 9 in in IN 10_1101-436634 414 10 cancer cancer NN 10_1101-436634 414 11 . . . 10_1101-436634 415 1 Cancer cancer NN 10_1101-436634 415 2 Res Res NNP 10_1101-436634 415 3 . . . 10_1101-436634 416 1 64 64 CD 10_1101-436634 416 2 , , , 10_1101-436634 416 3 7647–7654 7647–7654 CD 10_1101-436634 416 4 ( ( -LRB- 10_1101-436634 416 5 2004 2004 CD 10_1101-436634 416 6 ) ) -RRB- 10_1101-436634 416 7 . . . 10_1101-436634 417 1 7 7 LS 10_1101-436634 417 2 . . . 10_1101-436634 418 1 Climente Climente NNP 10_1101-436634 418 2 - - HYPH 10_1101-436634 418 3 González González NNP 10_1101-436634 418 4 , , , 10_1101-436634 418 5 H. H. NNP 10_1101-436634 418 6 , , , 10_1101-436634 418 7 Porta Porta NNP 10_1101-436634 418 8 - - HYPH 10_1101-436634 418 9 Pardo Pardo NNP 10_1101-436634 418 10 , , , 10_1101-436634 418 11 E. E. NNP 10_1101-436634 418 12 , , , 10_1101-436634 418 13 Godzik Godzik NNP 10_1101-436634 418 14 , , , 10_1101-436634 418 15 A. a. NN 10_1101-436634 419 1 & & CC 10_1101-436634 419 2 Eyras Eyras NNP 10_1101-436634 419 3 , , , 10_1101-436634 419 4 E. E. NNP 10_1101-436634 419 5 The the DT 10_1101-436634 419 6 Functional Functional NNP 10_1101-436634 419 7 Impact Impact NNP 10_1101-436634 419 8 of of IN 10_1101-436634 419 9 Alternative Alternative NNP 10_1101-436634 419 10 Splicing Splicing NNP 10_1101-436634 419 11 in in IN 10_1101-436634 419 12 Cancer Cancer NNP 10_1101-436634 419 13 . . . 10_1101-436634 420 1 Cell Cell NNP 10_1101-436634 420 2 Rep. Rep. NNP 10_1101-436634 420 3 20 20 CD 10_1101-436634 420 4 , , , 10_1101-436634 420 5 2215–2226 2215–2226 CD 10_1101-436634 420 6 ( ( -LRB- 10_1101-436634 420 7 2017 2017 CD 10_1101-436634 420 8 ) ) -RRB- 10_1101-436634 420 9 . . . 10_1101-436634 421 1 8 8 LS 10_1101-436634 421 2 . . . 10_1101-436634 422 1 Chen Chen NNP 10_1101-436634 422 2 , , , 10_1101-436634 422 3 J. J. NNP 10_1101-436634 423 1 & & CC 10_1101-436634 423 2 Weiss Weiss NNP 10_1101-436634 423 3 , , , 10_1101-436634 423 4 W. W. NNP 10_1101-436634 423 5 A. A. NNP 10_1101-436634 424 1 Alternative alternative JJ 10_1101-436634 424 2 splicing splicing NN 10_1101-436634 424 3 in in IN 10_1101-436634 424 4 cancer cancer NN 10_1101-436634 424 5 : : : 10_1101-436634 424 6 implications implication NNS 10_1101-436634 424 7 for for IN 10_1101-436634 424 8 biology biology NN 10_1101-436634 424 9 and and CC 10_1101-436634 424 10 therapy therapy NN 10_1101-436634 424 11 . . . 10_1101-436634 425 1 Oncogene oncogene NN 10_1101-436634 425 2 34 34 CD 10_1101-436634 425 3 , , , 10_1101-436634 425 4 1–14 1–14 CD 10_1101-436634 425 5 ( ( -LRB- 10_1101-436634 425 6 2015 2015 CD 10_1101-436634 425 7 ) ) -RRB- 10_1101-436634 425 8 . . . 10_1101-436634 426 1 9 9 CD 10_1101-436634 426 2 . . . 10_1101-436634 427 1 Xiong Xiong NNP 10_1101-436634 427 2 , , , 10_1101-436634 427 3 H. H. NNP 10_1101-436634 427 4 Y. Y. NNP 10_1101-436634 427 5 et et NNP 10_1101-436634 427 6 al al NNP 10_1101-436634 427 7 . . . 10_1101-436634 428 1 RNA RNA NNP 10_1101-436634 428 2 splicing splice VBG 10_1101-436634 428 3 . . . 10_1101-436634 429 1 The the DT 10_1101-436634 429 2 human human JJ 10_1101-436634 429 3 splicing splicing NN 10_1101-436634 429 4 code code NN 10_1101-436634 429 5 reveals reveal VBZ 10_1101-436634 429 6 new new JJ 10_1101-436634 429 7 insights insight NNS 10_1101-436634 429 8 into into IN 10_1101-436634 429 9 the the DT 10_1101-436634 429 10 genetic genetic JJ 10_1101-436634 429 11 determinants determinant NNS 10_1101-436634 429 12 of of IN 10_1101-436634 429 13 disease disease NN 10_1101-436634 429 14 . . . 10_1101-436634 430 1 Science science NN 10_1101-436634 430 2 347 347 CD 10_1101-436634 430 3 , , , 10_1101-436634 430 4 1254806 1254806 CD 10_1101-436634 430 5 ( ( -LRB- 10_1101-436634 430 6 2015 2015 CD 10_1101-436634 430 7 ) ) -RRB- 10_1101-436634 430 8 . . . 10_1101-436634 431 1 10 10 CD 10_1101-436634 431 2 . . . 10_1101-436634 432 1 Yeo Yeo NNP 10_1101-436634 432 2 , , , 10_1101-436634 432 3 G. G. NNP 10_1101-436634 432 4 & & CC 10_1101-436634 432 5 Burge Burge NNP 10_1101-436634 432 6 , , , 10_1101-436634 432 7 C. C. NNP 10_1101-436634 432 8 B. B. NNP 10_1101-436634 433 1 Maximum maximum JJ 10_1101-436634 433 2 entropy entropy JJ 10_1101-436634 433 3 modeling modeling NN 10_1101-436634 433 4 of of IN 10_1101-436634 433 5 short short JJ 10_1101-436634 433 6 sequence sequence NN 10_1101-436634 433 7 motifs motif NNS 10_1101-436634 433 8 with with IN 10_1101-436634 433 9 applications application NNS 10_1101-436634 433 10 to to TO 10_1101-436634 433 11 RNA rna VB 10_1101-436634 433 12 splicing splicing NN 10_1101-436634 433 13 signals signal NNS 10_1101-436634 433 14 . . . 10_1101-436634 434 1 J. J. NNP 10_1101-436634 434 2 Comput Comput NNP 10_1101-436634 434 3 . . . 10_1101-436634 435 1 Biol Biol NNP 10_1101-436634 435 2 . . . 10_1101-436634 436 1 11 11 CD 10_1101-436634 436 2 , , , 10_1101-436634 436 3 377–394 377–394 CD 10_1101-436634 436 4 ( ( -LRB- 10_1101-436634 436 5 2004 2004 CD 10_1101-436634 436 6 ) ) -RRB- 10_1101-436634 436 7 . . . 10_1101-436634 437 1 .CC .CC NFP 10_1101-436634 437 2 - - : 10_1101-436634 437 3 BY by IN 10_1101-436634 437 4 - - HYPH 10_1101-436634 437 5 NC NC NNP 10_1101-436634 437 6 - - HYPH 10_1101-436634 437 7 ND ND NNP 10_1101-436634 437 8 4.0 4.0 CD 10_1101-436634 437 9 International international JJ 10_1101-436634 437 10 licensea licensea NNS 10_1101-436634 437 11 certified certify VBN 10_1101-436634 437 12 by by IN 10_1101-436634 437 13 peer peer NN 10_1101-436634 437 14 review review NN 10_1101-436634 437 15 ) ) -RRB- 10_1101-436634 437 16 is be VBZ 10_1101-436634 437 17 the the DT 10_1101-436634 437 18 author author NN 10_1101-436634 437 19 / / SYM 10_1101-436634 437 20 funder funder NN 10_1101-436634 437 21 , , , 10_1101-436634 437 22 who who WP 10_1101-436634 437 23 has have VBZ 10_1101-436634 437 24 granted grant VBN 10_1101-436634 437 25 bioRxiv biorxiv IN 10_1101-436634 437 26 a a DT 10_1101-436634 437 27 license license NN 10_1101-436634 437 28 to to TO 10_1101-436634 437 29 display display VB 10_1101-436634 437 30 the the DT 10_1101-436634 437 31 preprint preprint NN 10_1101-436634 437 32 in in IN 10_1101-436634 437 33 perpetuity perpetuity NN 10_1101-436634 437 34 . . . 10_1101-436634 438 1 It -PRON- PRP 10_1101-436634 438 2 is be VBZ 10_1101-436634 438 3 made make VBN 10_1101-436634 438 4 available available JJ 10_1101-436634 438 5 under under IN 10_1101-436634 438 6 The the DT 10_1101-436634 438 7 copyright copyright NN 10_1101-436634 438 8 holder holder NN 10_1101-436634 438 9 for for IN 10_1101-436634 438 10 this this DT 10_1101-436634 438 11 preprint preprint NN 10_1101-436634 438 12 ( ( -LRB- 10_1101-436634 438 13 which which WDT 10_1101-436634 438 14 was be VBD 10_1101-436634 438 15 notthis notthis DT 10_1101-436634 438 16 version version NN 10_1101-436634 438 17 posted post VBN 10_1101-436634 438 18 January January NNP 10_1101-436634 438 19 5 5 CD 10_1101-436634 438 20 , , , 10_1101-436634 438 21 2021 2021 CD 10_1101-436634 438 22 . . . 10_1101-436634 438 23 ; ; : 10_1101-436634 438 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 438 25 : : : 10_1101-436634 438 26 bioRxiv biorxiv VB 10_1101-436634 438 27 preprint preprint NN 10_1101-436634 438 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 438 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 438 30 19 19 CD 10_1101-436634 438 31 11 11 CD 10_1101-436634 438 32 . . . 10_1101-436634 439 1 Fairbrother fairbrother RB 10_1101-436634 439 2 , , , 10_1101-436634 439 3 W. W. NNP 10_1101-436634 439 4 G. G. NNP 10_1101-436634 439 5 , , , 10_1101-436634 439 6 Yeh Yeh NNP 10_1101-436634 439 7 , , , 10_1101-436634 439 8 R.-F. R.-F. NNP 10_1101-436634 439 9 , , , 10_1101-436634 439 10 Sharp Sharp NNP 10_1101-436634 439 11 , , , 10_1101-436634 439 12 P. P. NNP 10_1101-436634 439 13 A. a. NN 10_1101-436634 440 1 & & CC 10_1101-436634 440 2 Burge Burge NNP 10_1101-436634 440 3 , , , 10_1101-436634 440 4 C. C. NNP 10_1101-436634 440 5 B. B. NNP 10_1101-436634 440 6 Predictive Predictive NNP 10_1101-436634 440 7 identification identification NN 10_1101-436634 440 8 of of IN 10_1101-436634 440 9 exonic exonic JJ 10_1101-436634 440 10 splicing splicing NN 10_1101-436634 440 11 enhancers enhancer NNS 10_1101-436634 440 12 in in IN 10_1101-436634 440 13 human human JJ 10_1101-436634 440 14 genes gene NNS 10_1101-436634 440 15 . . . 10_1101-436634 441 1 Science science NN 10_1101-436634 441 2 297 297 CD 10_1101-436634 441 3 , , , 10_1101-436634 441 4 1007–1013 1007–1013 CD 10_1101-436634 441 5 ( ( -LRB- 10_1101-436634 441 6 2002 2002 CD 10_1101-436634 441 7 ) ) -RRB- 10_1101-436634 441 8 . . . 10_1101-436634 442 1 12 12 CD 10_1101-436634 442 2 . . . 10_1101-436634 443 1 Wang Wang NNP 10_1101-436634 443 2 , , , 10_1101-436634 443 3 Z. Z. NNP 10_1101-436634 443 4 et et FW 10_1101-436634 443 5 al al NNP 10_1101-436634 443 6 . . . 10_1101-436634 444 1 Systematic systematic JJ 10_1101-436634 444 2 identification identification NN 10_1101-436634 444 3 and and CC 10_1101-436634 444 4 analysis analysis NN 10_1101-436634 444 5 of of IN 10_1101-436634 444 6 exonic exonic JJ 10_1101-436634 444 7 splicing splicing NN 10_1101-436634 444 8 silencers silencer NNS 10_1101-436634 444 9 . . . 10_1101-436634 445 1 Cell cell NN 10_1101-436634 445 2 119 119 CD 10_1101-436634 445 3 , , , 10_1101-436634 445 4 831–845 831–845 CD 10_1101-436634 445 5 ( ( -LRB- 10_1101-436634 445 6 2004 2004 CD 10_1101-436634 445 7 ) ) -RRB- 10_1101-436634 445 8 . . . 10_1101-436634 446 1 13 13 CD 10_1101-436634 446 2 . . . 10_1101-436634 447 1 Jaganathan Jaganathan NNP 10_1101-436634 447 2 , , , 10_1101-436634 447 3 K. K. NNP 10_1101-436634 447 4 et et NNP 10_1101-436634 447 5 al al NNP 10_1101-436634 447 6 . . . 10_1101-436634 448 1 Predicting predict VBG 10_1101-436634 448 2 Splicing splicing NN 10_1101-436634 448 3 from from IN 10_1101-436634 448 4 Primary primary JJ 10_1101-436634 448 5 Sequence sequence NN 10_1101-436634 448 6 with with IN 10_1101-436634 448 7 Deep Deep NNP 10_1101-436634 448 8 Learning Learning NNP 10_1101-436634 448 9 . . . 10_1101-436634 449 1 Cell cell NN 10_1101-436634 449 2 176 176 CD 10_1101-436634 449 3 , , , 10_1101-436634 449 4 535–548.e24 535–548.e24 CD 10_1101-436634 449 5 ( ( -LRB- 10_1101-436634 449 6 2019 2019 CD 10_1101-436634 449 7 ) ) -RRB- 10_1101-436634 449 8 . . . 10_1101-436634 450 1 14 14 CD 10_1101-436634 450 2 . . . 10_1101-436634 451 1 Kahles Kahles NNP 10_1101-436634 451 2 , , , 10_1101-436634 451 3 A. a. NN 10_1101-436634 451 4 , , , 10_1101-436634 451 5 Ong Ong NNP 10_1101-436634 451 6 , , , 10_1101-436634 451 7 C. C. NNP 10_1101-436634 451 8 S. S. NNP 10_1101-436634 451 9 , , , 10_1101-436634 451 10 Zhong Zhong NNP 10_1101-436634 451 11 , , , 10_1101-436634 451 12 Y. Y. NNP 10_1101-436634 452 1 & & CC 10_1101-436634 452 2 Rätsch Rätsch NNP 10_1101-436634 452 3 , , , 10_1101-436634 452 4 G. G. NNP 10_1101-436634 452 5 SplAdder SplAdder NNP 10_1101-436634 452 6 : : : 10_1101-436634 452 7 identification identification NN 10_1101-436634 452 8 , , , 10_1101-436634 452 9 quantification quantification NN 10_1101-436634 452 10 and and CC 10_1101-436634 452 11 testing testing NN 10_1101-436634 452 12 of of IN 10_1101-436634 452 13 alternative alternative JJ 10_1101-436634 452 14 splicing splicing NN 10_1101-436634 452 15 events event NNS 10_1101-436634 452 16 from from IN 10_1101-436634 452 17 RNA RNA NNP 10_1101-436634 452 18 - - HYPH 10_1101-436634 452 19 Seq Seq NNP 10_1101-436634 452 20 data datum NNS 10_1101-436634 452 21 . . . 10_1101-436634 453 1 Bioinformatics Bioinformatics NNP 10_1101-436634 453 2 32 32 CD 10_1101-436634 453 3 , , , 10_1101-436634 453 4 1840–1847 1840–1847 CD 10_1101-436634 453 5 ( ( -LRB- 10_1101-436634 453 6 2016 2016 CD 10_1101-436634 453 7 ) ) -RRB- 10_1101-436634 453 8 . . . 10_1101-436634 454 1 15 15 CD 10_1101-436634 454 2 . . . 10_1101-436634 455 1 Trincado Trincado NNP 10_1101-436634 455 2 , , , 10_1101-436634 455 3 J. J. NNP 10_1101-436634 455 4 L. L. NNP 10_1101-436634 455 5 et et NNP 10_1101-436634 455 6 al al NNP 10_1101-436634 455 7 . . . 10_1101-436634 456 1 SUPPA2 SUPPA2 NNP 10_1101-436634 456 2 : : : 10_1101-436634 456 3 fast fast RB 10_1101-436634 456 4 , , , 10_1101-436634 456 5 accurate accurate JJ 10_1101-436634 456 6 , , , 10_1101-436634 456 7 and and CC 10_1101-436634 456 8 uncertainty uncertainty NN 10_1101-436634 456 9 - - HYPH 10_1101-436634 456 10 aware aware JJ 10_1101-436634 456 11 differential differential NN 10_1101-436634 456 12 splicing splice VBG 10_1101-436634 456 13 analysis analysis NN 10_1101-436634 456 14 across across IN 10_1101-436634 456 15 multiple multiple JJ 10_1101-436634 456 16 conditions condition NNS 10_1101-436634 456 17 . . . 10_1101-436634 457 1 Genome Genome NNP 10_1101-436634 457 2 Biol Biol NNP 10_1101-436634 457 3 . . . 10_1101-436634 458 1 19 19 CD 10_1101-436634 458 2 , , , 10_1101-436634 458 3 40 40 CD 10_1101-436634 458 4 ( ( -LRB- 10_1101-436634 458 5 2018 2018 CD 10_1101-436634 458 6 ) ) -RRB- 10_1101-436634 458 7 . . . 10_1101-436634 459 1 16 16 CD 10_1101-436634 459 2 . . . 10_1101-436634 460 1 Kahles Kahles NNP 10_1101-436634 460 2 , , , 10_1101-436634 460 3 A. A. NNP 10_1101-436634 460 4 et et FW 10_1101-436634 460 5 al al NNP 10_1101-436634 460 6 . . . 10_1101-436634 461 1 Comprehensive comprehensive JJ 10_1101-436634 461 2 Analysis analysis NN 10_1101-436634 461 3 of of IN 10_1101-436634 461 4 Alternative Alternative NNP 10_1101-436634 461 5 Splicing Splicing NNP 10_1101-436634 461 6 Across Across NNP 10_1101-436634 461 7 Tumors Tumors NNPS 10_1101-436634 461 8 from from IN 10_1101-436634 461 9 8,705 8,705 CD 10_1101-436634 461 10 Patients patient NNS 10_1101-436634 461 11 . . . 10_1101-436634 462 1 Cancer Cancer NNP 10_1101-436634 462 2 Cell Cell NNP 10_1101-436634 462 3 34 34 CD 10_1101-436634 462 4 , , , 10_1101-436634 462 5 211–224.e6 211–224.e6 CD 10_1101-436634 462 6 ( ( -LRB- 10_1101-436634 462 7 2018 2018 CD 10_1101-436634 462 8 ) ) -RRB- 10_1101-436634 462 9 . . . 10_1101-436634 463 1 17 17 CD 10_1101-436634 463 2 . . . 10_1101-436634 464 1 Li Li NNP 10_1101-436634 464 2 , , , 10_1101-436634 464 3 Y. Y. NNP 10_1101-436634 464 4 I. I. NNP 10_1101-436634 464 5 et et NNP 10_1101-436634 464 6 al al NNP 10_1101-436634 464 7 . . . 10_1101-436634 465 1 Annotation annotation NN 10_1101-436634 465 2 - - HYPH 10_1101-436634 465 3 free free JJ 10_1101-436634 465 4 quantification quantification NN 10_1101-436634 465 5 of of IN 10_1101-436634 465 6 RNA RNA NNP 10_1101-436634 465 7 splicing splicing NN 10_1101-436634 465 8 using use VBG 10_1101-436634 465 9 LeafCutter LeafCutter NNP 10_1101-436634 465 10 . . . 10_1101-436634 466 1 Nat Nat NNP 10_1101-436634 466 2 . . . 10_1101-436634 467 1 Genet Genet NNP 10_1101-436634 467 2 . . . 10_1101-436634 468 1 50 50 CD 10_1101-436634 468 2 , , , 10_1101-436634 468 3 151–158 151–158 CD 10_1101-436634 468 4 ( ( -LRB- 10_1101-436634 468 5 2018 2018 CD 10_1101-436634 468 6 ) ) -RRB- 10_1101-436634 468 7 . . . 10_1101-436634 469 1 18 18 CD 10_1101-436634 469 2 . . . 10_1101-436634 470 1 Monlong Monlong NNP 10_1101-436634 470 2 , , , 10_1101-436634 470 3 J. J. NNP 10_1101-436634 470 4 , , , 10_1101-436634 470 5 Calvo Calvo NNP 10_1101-436634 470 6 , , , 10_1101-436634 470 7 M. M. NNP 10_1101-436634 470 8 , , , 10_1101-436634 470 9 Ferreira Ferreira NNP 10_1101-436634 470 10 , , , 10_1101-436634 470 11 P. P. NNP 10_1101-436634 470 12 G. G. NNP 10_1101-436634 470 13 & & CC 10_1101-436634 470 14 Guigó Guigó NNP 10_1101-436634 470 15 , , , 10_1101-436634 470 16 R. R. NNP 10_1101-436634 470 17 Identification Identification NNP 10_1101-436634 470 18 of of IN 10_1101-436634 470 19 genetic genetic JJ 10_1101-436634 470 20 variants variant NNS 10_1101-436634 470 21 associated associate VBN 10_1101-436634 470 22 with with IN 10_1101-436634 470 23 alternative alternative JJ 10_1101-436634 470 24 splicing splicing NN 10_1101-436634 470 25 using use VBG 10_1101-436634 470 26 sQTLseekeR. sqtlseeker. NN 10_1101-436634 471 1 Nat Nat NNP 10_1101-436634 471 2 . . . 10_1101-436634 472 1 Commun Commun VBN 10_1101-436634 472 2 . . . 10_1101-436634 473 1 5 5 CD 10_1101-436634 473 2 , , , 10_1101-436634 473 3 4698 4698 CD 10_1101-436634 473 4 ( ( -LRB- 10_1101-436634 473 5 2014 2014 CD 10_1101-436634 473 6 ) ) -RRB- 10_1101-436634 473 7 . . . 10_1101-436634 474 1 19 19 CD 10_1101-436634 474 2 . . . 10_1101-436634 475 1 Li Li NNP 10_1101-436634 475 2 , , , 10_1101-436634 475 3 Y. Y. NNP 10_1101-436634 475 4 I. I. NNP 10_1101-436634 475 5 et et NNP 10_1101-436634 475 6 al al NNP 10_1101-436634 475 7 . . . 10_1101-436634 476 1 RNA RNA NNP 10_1101-436634 476 2 splicing splicing NN 10_1101-436634 476 3 is be VBZ 10_1101-436634 476 4 a a DT 10_1101-436634 476 5 primary primary JJ 10_1101-436634 476 6 link link NN 10_1101-436634 476 7 between between IN 10_1101-436634 476 8 genetic genetic JJ 10_1101-436634 476 9 variation variation NN 10_1101-436634 476 10 and and CC 10_1101-436634 476 11 disease disease NN 10_1101-436634 476 12 . . . 10_1101-436634 477 1 Science science NN 10_1101-436634 477 2 352 352 CD 10_1101-436634 477 3 , , , 10_1101-436634 477 4 600–604 600–604 CD 10_1101-436634 477 5 ( ( -LRB- 10_1101-436634 477 6 2016 2016 CD 10_1101-436634 477 7 ) ) -RRB- 10_1101-436634 477 8 . . . 10_1101-436634 478 1 20 20 CD 10_1101-436634 478 2 . . . 10_1101-436634 479 1 Jayasinghe Jayasinghe NNP 10_1101-436634 479 2 , , , 10_1101-436634 479 3 R. R. NNP 10_1101-436634 479 4 G. G. NNP 10_1101-436634 479 5 et et NNP 10_1101-436634 479 6 al al NNP 10_1101-436634 479 7 . . . 10_1101-436634 480 1 Systematic systematic JJ 10_1101-436634 480 2 Analysis Analysis NNP 10_1101-436634 480 3 of of IN 10_1101-436634 480 4 Splice Splice NNP 10_1101-436634 480 5 - - HYPH 10_1101-436634 480 6 Site site NN 10_1101-436634 480 7 - - HYPH 10_1101-436634 480 8 Creating creating NN 10_1101-436634 480 9 Mutations Mutations NNPS 10_1101-436634 480 10 in in IN 10_1101-436634 480 11 Cancer Cancer NNP 10_1101-436634 480 12 . . . 10_1101-436634 481 1 Cell Cell NNP 10_1101-436634 481 2 Rep. Rep. NNP 10_1101-436634 481 3 23 23 CD 10_1101-436634 481 4 , , , 10_1101-436634 481 5 270–281.e3 270–281.e3 NNS 10_1101-436634 481 6 ( ( -LRB- 10_1101-436634 481 7 2018 2018 CD 10_1101-436634 481 8 ) ) -RRB- 10_1101-436634 481 9 . . . 10_1101-436634 482 1 21 21 CD 10_1101-436634 482 2 . . . 10_1101-436634 483 1 Viner Viner NNP 10_1101-436634 483 2 , , , 10_1101-436634 483 3 C. C. NNP 10_1101-436634 483 4 , , , 10_1101-436634 483 5 Dorman Dorman NNP 10_1101-436634 483 6 , , , 10_1101-436634 483 7 S. S. NNP 10_1101-436634 483 8 N. N. NNP 10_1101-436634 483 9 , , , 10_1101-436634 483 10 Shirley Shirley NNP 10_1101-436634 483 11 , , , 10_1101-436634 483 12 B. B. NNP 10_1101-436634 483 13 C. C. NNP 10_1101-436634 483 14 & & CC 10_1101-436634 483 15 Rogan Rogan NNP 10_1101-436634 483 16 , , , 10_1101-436634 483 17 P. P. NNP 10_1101-436634 483 18 K. K. NNP 10_1101-436634 483 19 Validation Validation NNP 10_1101-436634 483 20 of of IN 10_1101-436634 483 21 predicted predict VBD 10_1101-436634 483 22 mRNA mrna NN 10_1101-436634 483 23 splicing splicing NN 10_1101-436634 483 24 mutations mutation NNS 10_1101-436634 483 25 using use VBG 10_1101-436634 483 26 high high JJ 10_1101-436634 483 27 - - HYPH 10_1101-436634 483 28 throughput throughput NN 10_1101-436634 483 29 transcriptome transcriptome DT 10_1101-436634 483 30 data datum NNS 10_1101-436634 483 31 . . . 10_1101-436634 484 1 F1000Res f1000re NNS 10_1101-436634 484 2 . . . 10_1101-436634 485 1 3 3 CD 10_1101-436634 485 2 , , , 10_1101-436634 485 3 ( ( -LRB- 10_1101-436634 485 4 2014 2014 CD 10_1101-436634 485 5 ) ) -RRB- 10_1101-436634 485 6 . . . 10_1101-436634 486 1 22 22 CD 10_1101-436634 486 2 . . . 10_1101-436634 487 1 Shirley Shirley NNP 10_1101-436634 487 2 , , , 10_1101-436634 487 3 B. B. NNP 10_1101-436634 487 4 C. C. NNP 10_1101-436634 487 5 , , , 10_1101-436634 487 6 Mucaki Mucaki NNP 10_1101-436634 487 7 , , , 10_1101-436634 487 8 E. E. NNP 10_1101-436634 487 9 J. J. NNP 10_1101-436634 488 1 & & CC 10_1101-436634 488 2 Rogan Rogan NNP 10_1101-436634 488 3 , , , 10_1101-436634 488 4 P. P. NNP 10_1101-436634 488 5 K. K. NNP 10_1101-436634 488 6 Pan Pan NNP 10_1101-436634 488 7 - - HYPH 10_1101-436634 488 8 cancer cancer NN 10_1101-436634 488 9 repository repository NN 10_1101-436634 488 10 of of IN 10_1101-436634 488 11 validated validate VBN 10_1101-436634 488 12 natural natural JJ 10_1101-436634 488 13 and and CC 10_1101-436634 488 14 cryptic cryptic JJ 10_1101-436634 488 15 mRNA mrna NN 10_1101-436634 488 16 splicing splicing NN 10_1101-436634 488 17 mutations mutation NNS 10_1101-436634 488 18 . . . 10_1101-436634 489 1 F1000Res f1000re NNS 10_1101-436634 489 2 . . . 10_1101-436634 490 1 7 7 CD 10_1101-436634 490 2 , , , 10_1101-436634 490 3 1908 1908 CD 10_1101-436634 490 4 ( ( -LRB- 10_1101-436634 490 5 2018 2018 CD 10_1101-436634 490 6 ) ) -RRB- 10_1101-436634 490 7 . . . 10_1101-436634 491 1 23 23 CD 10_1101-436634 491 2 . . . 10_1101-436634 492 1 Shiraishi Shiraishi NNP 10_1101-436634 492 2 , , , 10_1101-436634 492 3 Y. Y. NNP 10_1101-436634 492 4 et et NNP 10_1101-436634 492 5 al al NNP 10_1101-436634 492 6 . . . 10_1101-436634 493 1 A a DT 10_1101-436634 493 2 comprehensive comprehensive JJ 10_1101-436634 493 3 characterization characterization NN 10_1101-436634 493 4 of of IN 10_1101-436634 493 5 cis cis NN 10_1101-436634 493 6 - - HYPH 10_1101-436634 493 7 acting act VBG 10_1101-436634 493 8 splicing splicing NN 10_1101-436634 493 9 - - HYPH 10_1101-436634 493 10 associated associate VBN 10_1101-436634 493 11 .CC .CC : 10_1101-436634 493 12 - - : 10_1101-436634 493 13 BY by IN 10_1101-436634 493 14 - - HYPH 10_1101-436634 493 15 NC NC NNP 10_1101-436634 493 16 - - HYPH 10_1101-436634 493 17 ND ND NNP 10_1101-436634 493 18 4.0 4.0 CD 10_1101-436634 493 19 International international JJ 10_1101-436634 493 20 licensea licensea NNS 10_1101-436634 493 21 certified certify VBN 10_1101-436634 493 22 by by IN 10_1101-436634 493 23 peer peer NN 10_1101-436634 493 24 review review NN 10_1101-436634 493 25 ) ) -RRB- 10_1101-436634 493 26 is be VBZ 10_1101-436634 493 27 the the DT 10_1101-436634 493 28 author author NN 10_1101-436634 493 29 / / SYM 10_1101-436634 493 30 funder funder NN 10_1101-436634 493 31 , , , 10_1101-436634 493 32 who who WP 10_1101-436634 493 33 has have VBZ 10_1101-436634 493 34 granted grant VBN 10_1101-436634 493 35 bioRxiv biorxiv IN 10_1101-436634 493 36 a a DT 10_1101-436634 493 37 license license NN 10_1101-436634 493 38 to to TO 10_1101-436634 493 39 display display VB 10_1101-436634 493 40 the the DT 10_1101-436634 493 41 preprint preprint NN 10_1101-436634 493 42 in in IN 10_1101-436634 493 43 perpetuity perpetuity NN 10_1101-436634 493 44 . . . 10_1101-436634 494 1 It -PRON- PRP 10_1101-436634 494 2 is be VBZ 10_1101-436634 494 3 made make VBN 10_1101-436634 494 4 available available JJ 10_1101-436634 494 5 under under IN 10_1101-436634 494 6 The the DT 10_1101-436634 494 7 copyright copyright NN 10_1101-436634 494 8 holder holder NN 10_1101-436634 494 9 for for IN 10_1101-436634 494 10 this this DT 10_1101-436634 494 11 preprint preprint NN 10_1101-436634 494 12 ( ( -LRB- 10_1101-436634 494 13 which which WDT 10_1101-436634 494 14 was be VBD 10_1101-436634 494 15 notthis notthis DT 10_1101-436634 494 16 version version NN 10_1101-436634 494 17 posted post VBN 10_1101-436634 494 18 January January NNP 10_1101-436634 494 19 5 5 CD 10_1101-436634 494 20 , , , 10_1101-436634 494 21 2021 2021 CD 10_1101-436634 494 22 . . . 10_1101-436634 494 23 ; ; : 10_1101-436634 494 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 494 25 : : : 10_1101-436634 494 26 bioRxiv biorxiv VB 10_1101-436634 494 27 preprint preprint NN 10_1101-436634 494 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 494 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB 10_1101-436634 494 30 20 20 CD 10_1101-436634 494 31 variants variant NNS 10_1101-436634 494 32 in in IN 10_1101-436634 494 33 human human JJ 10_1101-436634 494 34 cancer cancer NN 10_1101-436634 494 35 . . . 10_1101-436634 495 1 Genome Genome NNP 10_1101-436634 495 2 Res Res NNP 10_1101-436634 495 3 . . . 10_1101-436634 496 1 28 28 CD 10_1101-436634 496 2 , , , 10_1101-436634 496 3 1111–1125 1111–1125 CD 10_1101-436634 496 4 ( ( -LRB- 10_1101-436634 496 5 2018 2018 CD 10_1101-436634 496 6 ) ) -RRB- 10_1101-436634 496 7 . . . 10_1101-436634 497 1 24 24 CD 10_1101-436634 497 2 . . . 10_1101-436634 498 1 GTEx GTEx VBZ 10_1101-436634 498 2 Consortium Consortium NNP 10_1101-436634 498 3 . . . 10_1101-436634 499 1 The the DT 10_1101-436634 499 2 Genotype Genotype NNP 10_1101-436634 499 3 - - HYPH 10_1101-436634 499 4 Tissue Tissue NNP 10_1101-436634 499 5 Expression expression NN 10_1101-436634 499 6 ( ( -LRB- 10_1101-436634 499 7 GTEx GTEx NNP 10_1101-436634 499 8 ) ) -RRB- 10_1101-436634 499 9 project project NN 10_1101-436634 499 10 . . . 10_1101-436634 500 1 Nat Nat NNP 10_1101-436634 500 2 . . . 10_1101-436634 501 1 Genet Genet NNP 10_1101-436634 501 2 . . . 10_1101-436634 502 1 45 45 CD 10_1101-436634 502 2 , , , 10_1101-436634 502 3 580 580 CD 10_1101-436634 502 4 – – : 10_1101-436634 502 5 585 585 CD 10_1101-436634 502 6 ( ( -LRB- 10_1101-436634 502 7 2013 2013 CD 10_1101-436634 502 8 ) ) -RRB- 10_1101-436634 502 9 . . . 10_1101-436634 503 1 25 25 CD 10_1101-436634 503 2 . . . 10_1101-436634 504 1 McLaren McLaren NNP 10_1101-436634 504 2 , , , 10_1101-436634 504 3 W. W. NNP 10_1101-436634 504 4 et et NNP 10_1101-436634 504 5 al al NNP 10_1101-436634 504 6 . . . 10_1101-436634 505 1 The the DT 10_1101-436634 505 2 Ensembl Ensembl NNP 10_1101-436634 505 3 Variant Variant NNP 10_1101-436634 505 4 Effect Effect NNP 10_1101-436634 505 5 Predictor Predictor NNP 10_1101-436634 505 6 . . . 10_1101-436634 506 1 Genome Genome NNP 10_1101-436634 506 2 Biol Biol NNP 10_1101-436634 506 3 . . . 10_1101-436634 507 1 17 17 CD 10_1101-436634 507 2 , , , 10_1101-436634 507 3 122 122 CD 10_1101-436634 507 4 ( ( -LRB- 10_1101-436634 507 5 2016 2016 CD 10_1101-436634 507 6 ) ) -RRB- 10_1101-436634 507 7 . . . 10_1101-436634 508 1 26 26 CD 10_1101-436634 508 2 . . . 10_1101-436634 509 1 Li Li NNP 10_1101-436634 509 2 , , , 10_1101-436634 509 3 H. H. NNP 10_1101-436634 509 4 et et NNP 10_1101-436634 509 5 al al NNP 10_1101-436634 509 6 . . . 10_1101-436634 510 1 The the DT 10_1101-436634 510 2 Sequence Sequence NNP 10_1101-436634 510 3 Alignment Alignment NNP 10_1101-436634 510 4 / / SYM 10_1101-436634 510 5 Map Map NNP 10_1101-436634 510 6 format format NN 10_1101-436634 510 7 and and CC 10_1101-436634 510 8 SAMtools samtool NNS 10_1101-436634 510 9 . . . 10_1101-436634 511 1 Bioinformatics Bioinformatics NNP 10_1101-436634 511 2 25 25 CD 10_1101-436634 511 3 , , , 10_1101-436634 511 4 2078 2078 CD 10_1101-436634 511 5 – – : 10_1101-436634 511 6 2079 2079 CD 10_1101-436634 511 7 ( ( -LRB- 10_1101-436634 511 8 2009 2009 CD 10_1101-436634 511 9 ) ) -RRB- 10_1101-436634 511 10 . . . 10_1101-436634 512 1 27 27 CD 10_1101-436634 512 2 . . . 10_1101-436634 513 1 Sondka sondka NN 10_1101-436634 513 2 , , , 10_1101-436634 513 3 Z. Z. NNP 10_1101-436634 513 4 et et FW 10_1101-436634 513 5 al al NNP 10_1101-436634 513 6 . . . 10_1101-436634 514 1 The the DT 10_1101-436634 514 2 COSMIC COSMIC NNP 10_1101-436634 514 3 Cancer Cancer NNP 10_1101-436634 514 4 Gene Gene NNP 10_1101-436634 514 5 Census Census NNP 10_1101-436634 514 6 : : : 10_1101-436634 514 7 describing describe VBG 10_1101-436634 514 8 genetic genetic JJ 10_1101-436634 514 9 dysfunction dysfunction NN 10_1101-436634 514 10 across across IN 10_1101-436634 514 11 all all DT 10_1101-436634 514 12 human human JJ 10_1101-436634 514 13 cancers cancer NNS 10_1101-436634 514 14 . . . 10_1101-436634 515 1 Nat Nat NNP 10_1101-436634 515 2 . . . 10_1101-436634 516 1 Rev. Rev. NNP 10_1101-436634 517 1 Cancer cancer NN 10_1101-436634 517 2 18 18 CD 10_1101-436634 517 3 , , , 10_1101-436634 517 4 696–705 696–705 CD 10_1101-436634 517 5 ( ( -LRB- 10_1101-436634 517 6 2018 2018 CD 10_1101-436634 517 7 ) ) -RRB- 10_1101-436634 517 8 . . . 10_1101-436634 518 1 28 28 CD 10_1101-436634 518 2 . . . 10_1101-436634 519 1 Robinson Robinson NNP 10_1101-436634 519 2 , , , 10_1101-436634 519 3 J. J. NNP 10_1101-436634 519 4 T. T. NNP 10_1101-436634 519 5 et et NNP 10_1101-436634 519 6 al al NNP 10_1101-436634 519 7 . . . 10_1101-436634 520 1 Integrative integrative JJ 10_1101-436634 520 2 genomics genomic NNS 10_1101-436634 520 3 viewer viewer NN 10_1101-436634 520 4 . . . 10_1101-436634 521 1 Nat Nat NNP 10_1101-436634 521 2 . . . 10_1101-436634 522 1 Biotechnol Biotechnol NNS 10_1101-436634 522 2 . . . 10_1101-436634 523 1 29 29 CD 10_1101-436634 523 2 , , , 10_1101-436634 523 3 24–26 24–26 CD 10_1101-436634 523 4 ( ( -LRB- 10_1101-436634 523 5 2011 2011 CD 10_1101-436634 523 6 ) ) -RRB- 10_1101-436634 523 7 . . . 10_1101-436634 524 1 29 29 CD 10_1101-436634 524 2 . . . 10_1101-436634 525 1 Dobin Dobin NNP 10_1101-436634 525 2 , , , 10_1101-436634 525 3 A. A. NNP 10_1101-436634 525 4 et et FW 10_1101-436634 525 5 al al NNP 10_1101-436634 525 6 . . . 10_1101-436634 526 1 STAR STAR NNP 10_1101-436634 526 2 : : : 10_1101-436634 526 3 ultrafast ultrafast NNP 10_1101-436634 526 4 universal universal JJ 10_1101-436634 526 5 RNA RNA NNP 10_1101-436634 526 6 - - HYPH 10_1101-436634 526 7 seq seq NN 10_1101-436634 526 8 aligner aligner NN 10_1101-436634 526 9 . . . 10_1101-436634 527 1 Bioinformatics Bioinformatics NNP 10_1101-436634 527 2 29 29 CD 10_1101-436634 527 3 , , , 10_1101-436634 527 4 15–21 15–21 NNP 10_1101-436634 527 5 ( ( -LRB- 10_1101-436634 527 6 2013 2013 CD 10_1101-436634 527 7 ) ) -RRB- 10_1101-436634 527 8 . . . 10_1101-436634 528 1 30 30 CD 10_1101-436634 528 2 . . . 10_1101-436634 529 1 Kim Kim NNP 10_1101-436634 529 2 , , , 10_1101-436634 529 3 D. D. NNP 10_1101-436634 529 4 , , , 10_1101-436634 529 5 Langmead Langmead NNP 10_1101-436634 529 6 , , , 10_1101-436634 529 7 B. B. NNP 10_1101-436634 530 1 & & CC 10_1101-436634 530 2 Salzberg Salzberg NNP 10_1101-436634 530 3 , , , 10_1101-436634 530 4 S. S. NNP 10_1101-436634 530 5 L. L. NNP 10_1101-436634 530 6 HISAT HISAT NNP 10_1101-436634 530 7 : : : 10_1101-436634 530 8 a a DT 10_1101-436634 530 9 fast fast JJ 10_1101-436634 530 10 spliced spliced JJ 10_1101-436634 530 11 aligner aligner NN 10_1101-436634 530 12 with with IN 10_1101-436634 530 13 low low JJ 10_1101-436634 530 14 memory memory NN 10_1101-436634 530 15 requirements requirement NNS 10_1101-436634 530 16 . . . 10_1101-436634 531 1 Nat Nat NNP 10_1101-436634 531 2 . . . 10_1101-436634 532 1 Methods method NNS 10_1101-436634 532 2 12 12 CD 10_1101-436634 532 3 , , , 10_1101-436634 532 4 357–360 357–360 CD 10_1101-436634 532 5 ( ( -LRB- 10_1101-436634 532 6 2015 2015 CD 10_1101-436634 532 7 ) ) -RRB- 10_1101-436634 532 8 . . . 10_1101-436634 533 1 31 31 CD 10_1101-436634 533 2 . . . 10_1101-436634 534 1 Kim Kim NNP 10_1101-436634 534 2 , , , 10_1101-436634 534 3 D. D. NNP 10_1101-436634 534 4 et et NNP 10_1101-436634 534 5 al al NNP 10_1101-436634 534 6 . . . 10_1101-436634 535 1 TopHat2 TopHat2 NNP 10_1101-436634 535 2 : : : 10_1101-436634 535 3 accurate accurate JJ 10_1101-436634 535 4 alignment alignment NN 10_1101-436634 535 5 of of IN 10_1101-436634 535 6 transcriptomes transcriptome NNS 10_1101-436634 535 7 in in IN 10_1101-436634 535 8 the the DT 10_1101-436634 535 9 presence presence NN 10_1101-436634 535 10 of of IN 10_1101-436634 535 11 insertions insertion NNS 10_1101-436634 535 12 , , , 10_1101-436634 535 13 deletions deletion NNS 10_1101-436634 535 14 and and CC 10_1101-436634 535 15 gene gene NN 10_1101-436634 535 16 fusions fusion NNS 10_1101-436634 535 17 . . . 10_1101-436634 536 1 Genome Genome NNP 10_1101-436634 536 2 Biol Biol NNP 10_1101-436634 536 3 . . . 10_1101-436634 537 1 14 14 CD 10_1101-436634 537 2 , , , 10_1101-436634 537 3 R36 R36 NNP 10_1101-436634 537 4 ( ( -LRB- 10_1101-436634 537 5 2013 2013 CD 10_1101-436634 537 6 ) ) -RRB- 10_1101-436634 537 7 . . . 10_1101-436634 538 1 32 32 CD 10_1101-436634 538 2 . . . 10_1101-436634 539 1 Conway Conway NNP 10_1101-436634 539 2 , , , 10_1101-436634 539 3 J. J. NNP 10_1101-436634 539 4 R. R. NNP 10_1101-436634 539 5 , , , 10_1101-436634 539 6 Lex Lex NNP 10_1101-436634 539 7 , , , 10_1101-436634 539 8 A. a. NN 10_1101-436634 540 1 & & CC 10_1101-436634 540 2 Gehlenborg Gehlenborg NNP 10_1101-436634 540 3 , , , 10_1101-436634 540 4 N. N. NNP 10_1101-436634 540 5 UpSetR UpSetR NNP 10_1101-436634 540 6 : : : 10_1101-436634 540 7 an an DT 10_1101-436634 540 8 R r NN 10_1101-436634 540 9 package package NN 10_1101-436634 540 10 for for IN 10_1101-436634 540 11 the the DT 10_1101-436634 540 12 visualization visualization NN 10_1101-436634 540 13 of of IN 10_1101-436634 540 14 intersecting intersect VBG 10_1101-436634 540 15 sets set NNS 10_1101-436634 540 16 and and CC 10_1101-436634 540 17 their -PRON- PRP$ 10_1101-436634 540 18 properties property NNS 10_1101-436634 540 19 . . . 10_1101-436634 541 1 Bioinformatics Bioinformatics NNP 10_1101-436634 541 2 33 33 CD 10_1101-436634 541 3 , , , 10_1101-436634 541 4 2938–2940 2938–2940 CD 10_1101-436634 541 5 ( ( -LRB- 10_1101-436634 541 6 2017 2017 CD 10_1101-436634 541 7 ) ) -RRB- 10_1101-436634 541 8 . . . 10_1101-436634 542 1 33 33 CD 10_1101-436634 542 2 . . . 10_1101-436634 543 1 Surget Surget NNP 10_1101-436634 543 2 , , , 10_1101-436634 543 3 S. S. NNP 10_1101-436634 543 4 , , , 10_1101-436634 543 5 Khoury Khoury NNP 10_1101-436634 543 6 , , , 10_1101-436634 543 7 M. M. NNP 10_1101-436634 543 8 P. P. NNP 10_1101-436634 543 9 & & CC 10_1101-436634 543 10 Bourdon Bourdon NNP 10_1101-436634 543 11 , , , 10_1101-436634 543 12 J.-C. J.-C. NNP 10_1101-436634 543 13 Uncovering uncover VBG 10_1101-436634 543 14 the the DT 10_1101-436634 543 15 role role NN 10_1101-436634 543 16 of of IN 10_1101-436634 543 17 p53 p53 NN 10_1101-436634 543 18 splice splice NN 10_1101-436634 543 19 variants variant NNS 10_1101-436634 543 20 in in IN 10_1101-436634 543 21 human human JJ 10_1101-436634 543 22 malignancy malignancy NN 10_1101-436634 543 23 : : : 10_1101-436634 543 24 a a DT 10_1101-436634 543 25 clinical clinical JJ 10_1101-436634 543 26 perspective perspective NN 10_1101-436634 543 27 . . . 10_1101-436634 544 1 Onco Onco NNP 10_1101-436634 544 2 . . . 10_1101-436634 545 1 Targets target NNS 10_1101-436634 545 2 . . . 10_1101-436634 546 1 Ther ther RB 10_1101-436634 546 2 . . . 10_1101-436634 547 1 7 7 CD 10_1101-436634 547 2 , , , 10_1101-436634 547 3 57–68 57–68 CD 10_1101-436634 547 4 ( ( -LRB- 10_1101-436634 547 5 2013 2013 CD 10_1101-436634 547 6 ) ) -RRB- 10_1101-436634 547 7 . . . 10_1101-436634 548 1 34 34 CD 10_1101-436634 548 2 . . . 10_1101-436634 549 1 Tokheim Tokheim NNP 10_1101-436634 549 2 , , , 10_1101-436634 549 3 C. C. NNP 10_1101-436634 549 4 & & CC 10_1101-436634 549 5 Karchin Karchin NNP 10_1101-436634 549 6 , , , 10_1101-436634 549 7 R. R. NNP 10_1101-436634 549 8 CHASMplus CHASMplus NNP 10_1101-436634 549 9 Reveals reveal VBZ 10_1101-436634 549 10 the the DT 10_1101-436634 549 11 Scope Scope NNP 10_1101-436634 549 12 of of IN 10_1101-436634 549 13 Somatic Somatic NNP 10_1101-436634 549 14 Missense Missense NNP 10_1101-436634 549 15 Mutations mutation NNS 10_1101-436634 549 16 Driving Driving NNP 10_1101-436634 549 17 Human Human NNP 10_1101-436634 549 18 Cancers Cancers NNPS 10_1101-436634 549 19 . . . 10_1101-436634 550 1 Cell Cell NNP 10_1101-436634 550 2 Syst Syst NNP 10_1101-436634 550 3 9 9 CD 10_1101-436634 550 4 , , , 10_1101-436634 550 5 9–23.e8 9–23.e8 CD 10_1101-436634 550 6 ( ( -LRB- 10_1101-436634 550 7 2019 2019 CD 10_1101-436634 550 8 ) ) -RRB- 10_1101-436634 550 9 . . . 10_1101-436634 551 1 35 35 CD 10_1101-436634 551 2 . . . 10_1101-436634 552 1 Bicknell Bicknell NNP 10_1101-436634 552 2 , , , 10_1101-436634 552 3 D. D. NNP 10_1101-436634 552 4 C. C. NNP 10_1101-436634 552 5 , , , 10_1101-436634 552 6 Kaklamanis Kaklamanis NNP 10_1101-436634 552 7 , , , 10_1101-436634 552 8 L. L. NNP 10_1101-436634 552 9 , , , 10_1101-436634 552 10 Hampson Hampson NNP 10_1101-436634 552 11 , , , 10_1101-436634 552 12 R. R. NNP 10_1101-436634 552 13 , , , 10_1101-436634 552 14 Bodmer Bodmer NNP 10_1101-436634 552 15 , , , 10_1101-436634 552 16 W. W. NNP 10_1101-436634 552 17 F. F. NNP 10_1101-436634 552 18 & & CC 10_1101-436634 552 19 Karran Karran NNP 10_1101-436634 552 20 , , , 10_1101-436634 552 21 P. P. NNP 10_1101-436634 552 22 Selection Selection NNP 10_1101-436634 552 23 for for IN 10_1101-436634 552 24 β2- β2- NNP 10_1101-436634 552 25 microglobulin microglobulin NNP 10_1101-436634 552 26 mutation mutation NN 10_1101-436634 552 27 in in IN 10_1101-436634 552 28 mismatch mismatch NN 10_1101-436634 552 29 repair repair NN 10_1101-436634 552 30 - - HYPH 10_1101-436634 552 31 defective defective JJ 10_1101-436634 552 32 colorectal colorectal JJ 10_1101-436634 552 33 carcinomas carcinoma NNS 10_1101-436634 552 34 . . . 10_1101-436634 553 1 Curr curr UH 10_1101-436634 553 2 . . . 10_1101-436634 554 1 Biol Biol NNP 10_1101-436634 554 2 . . . 10_1101-436634 555 1 6 6 CD 10_1101-436634 555 2 , , , 10_1101-436634 555 3 1695–1697 1695–1697 CD 10_1101-436634 555 4 ( ( -LRB- 10_1101-436634 555 5 1996 1996 CD 10_1101-436634 555 6 ) ) -RRB- 10_1101-436634 555 7 . . . 10_1101-436634 556 1 36 36 CD 10_1101-436634 556 2 . . . 10_1101-436634 557 1 Bonneville Bonneville NNP 10_1101-436634 557 2 , , , 10_1101-436634 557 3 R. R. NNP 10_1101-436634 557 4 et et NNP 10_1101-436634 557 5 al al NNP 10_1101-436634 557 6 . . . 10_1101-436634 558 1 Landscape Landscape NNP 10_1101-436634 558 2 of of IN 10_1101-436634 558 3 Microsatellite Microsatellite NNP 10_1101-436634 558 4 Instability Instability NNP 10_1101-436634 558 5 Across Across NNP 10_1101-436634 558 6 39 39 CD 10_1101-436634 558 7 Cancer Cancer NNP 10_1101-436634 558 8 Types Types NNPS 10_1101-436634 558 9 . . . 10_1101-436634 559 1 JCO JCO NNP 10_1101-436634 559 2 Precis Precis NNP 10_1101-436634 559 3 Oncol Oncol NNP 10_1101-436634 559 4 2017 2017 CD 10_1101-436634 559 5 , , , 10_1101-436634 559 6 ( ( -LRB- 10_1101-436634 559 7 2017 2017 CD 10_1101-436634 559 8 ) ) -RRB- 10_1101-436634 559 9 . . . 10_1101-436634 560 1 37 37 CD 10_1101-436634 560 2 . . . 10_1101-436634 561 1 Kloor Kloor NNP 10_1101-436634 561 2 , , , 10_1101-436634 561 3 M. M. NNP 10_1101-436634 561 4 et et NNP 10_1101-436634 561 5 al al NNP 10_1101-436634 561 6 . . . 10_1101-436634 562 1 Immunoselective immunoselective JJ 10_1101-436634 562 2 pressure pressure NN 10_1101-436634 562 3 and and CC 10_1101-436634 562 4 human human JJ 10_1101-436634 562 5 leukocyte leukocyte NN 10_1101-436634 562 6 antigen antigen NNP 10_1101-436634 562 7 class class NNP 10_1101-436634 562 8 I -PRON- PRP 10_1101-436634 562 9 antigen antigen VBP 10_1101-436634 562 10 .CC .CC NFP 10_1101-436634 562 11 - - : 10_1101-436634 562 12 BY by IN 10_1101-436634 562 13 - - HYPH 10_1101-436634 562 14 NC NC NNP 10_1101-436634 562 15 - - HYPH 10_1101-436634 562 16 ND ND NNP 10_1101-436634 562 17 4.0 4.0 CD 10_1101-436634 562 18 International international JJ 10_1101-436634 562 19 licensea licensea NNS 10_1101-436634 562 20 certified certify VBN 10_1101-436634 562 21 by by IN 10_1101-436634 562 22 peer peer NN 10_1101-436634 562 23 review review NN 10_1101-436634 562 24 ) ) -RRB- 10_1101-436634 562 25 is be VBZ 10_1101-436634 562 26 the the DT 10_1101-436634 562 27 author author NN 10_1101-436634 562 28 / / SYM 10_1101-436634 562 29 funder funder NN 10_1101-436634 562 30 , , , 10_1101-436634 562 31 who who WP 10_1101-436634 562 32 has have VBZ 10_1101-436634 562 33 granted grant VBN 10_1101-436634 562 34 bioRxiv biorxiv IN 10_1101-436634 562 35 a a DT 10_1101-436634 562 36 license license NN 10_1101-436634 562 37 to to TO 10_1101-436634 562 38 display display VB 10_1101-436634 562 39 the the DT 10_1101-436634 562 40 preprint preprint NN 10_1101-436634 562 41 in in IN 10_1101-436634 562 42 perpetuity perpetuity NN 10_1101-436634 562 43 . . . 10_1101-436634 563 1 It -PRON- PRP 10_1101-436634 563 2 is be VBZ 10_1101-436634 563 3 made make VBN 10_1101-436634 563 4 available available JJ 10_1101-436634 563 5 under under IN 10_1101-436634 563 6 The the DT 10_1101-436634 563 7 copyright copyright NN 10_1101-436634 563 8 holder holder NN 10_1101-436634 563 9 for for IN 10_1101-436634 563 10 this this DT 10_1101-436634 563 11 preprint preprint NN 10_1101-436634 563 12 ( ( -LRB- 10_1101-436634 563 13 which which WDT 10_1101-436634 563 14 was be VBD 10_1101-436634 563 15 notthis notthis DT 10_1101-436634 563 16 version version NN 10_1101-436634 563 17 posted post VBN 10_1101-436634 563 18 January January NNP 10_1101-436634 563 19 5 5 CD 10_1101-436634 563 20 , , , 10_1101-436634 563 21 2021 2021 CD 10_1101-436634 563 22 . . . 10_1101-436634 563 23 ; ; : 10_1101-436634 563 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 563 25 : : : 10_1101-436634 563 26 bioRxiv biorxiv VB 10_1101-436634 563 27 preprint preprint NN 10_1101-436634 563 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 563 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 563 30 21 21 CD 10_1101-436634 563 31 machinery machinery NN 10_1101-436634 563 32 defects defect NNS 10_1101-436634 563 33 in in IN 10_1101-436634 563 34 microsatellite microsatellite NNP 10_1101-436634 563 35 unstable unstable JJ 10_1101-436634 563 36 colorectal colorectal JJ 10_1101-436634 563 37 cancers cancer NNS 10_1101-436634 563 38 . . . 10_1101-436634 564 1 Cancer cancer NN 10_1101-436634 564 2 Res Res NNP 10_1101-436634 564 3 . . . 10_1101-436634 565 1 65 65 CD 10_1101-436634 565 2 , , , 10_1101-436634 565 3 6418 6418 CD 10_1101-436634 565 4 – – : 10_1101-436634 565 5 6424 6424 CD 10_1101-436634 565 6 ( ( -LRB- 10_1101-436634 565 7 2005 2005 CD 10_1101-436634 565 8 ) ) -RRB- 10_1101-436634 565 9 . . . 10_1101-436634 566 1 38 38 CD 10_1101-436634 566 2 . . . 10_1101-436634 567 1 Sade Sade NNP 10_1101-436634 567 2 - - HYPH 10_1101-436634 567 3 Feldman Feldman NNP 10_1101-436634 567 4 , , , 10_1101-436634 567 5 M. M. NNP 10_1101-436634 567 6 et et NNP 10_1101-436634 567 7 al al NNP 10_1101-436634 567 8 . . . 10_1101-436634 568 1 Resistance resistance NN 10_1101-436634 568 2 to to TO 10_1101-436634 568 3 checkpoint checkpoint VB 10_1101-436634 568 4 blockade blockade NN 10_1101-436634 568 5 therapy therapy NN 10_1101-436634 568 6 through through IN 10_1101-436634 568 7 inactivation inactivation NN 10_1101-436634 568 8 of of IN 10_1101-436634 568 9 antigen antigen NNP 10_1101-436634 568 10 presentation presentation NNP 10_1101-436634 568 11 . . . 10_1101-436634 569 1 Nat Nat NNP 10_1101-436634 569 2 . . . 10_1101-436634 570 1 Commun Commun VBN 10_1101-436634 570 2 . . . 10_1101-436634 571 1 8 8 CD 10_1101-436634 571 2 , , , 10_1101-436634 571 3 1136 1136 CD 10_1101-436634 571 4 ( ( -LRB- 10_1101-436634 571 5 2017 2017 CD 10_1101-436634 571 6 ) ) -RRB- 10_1101-436634 571 7 . . . 10_1101-436634 572 1 39 39 CD 10_1101-436634 572 2 . . . 10_1101-436634 573 1 Seliger Seliger NNP 10_1101-436634 573 2 , , , 10_1101-436634 573 3 B. B. NNP 10_1101-436634 573 4 , , , 10_1101-436634 573 5 Maeurer Maeurer NNP 10_1101-436634 573 6 , , , 10_1101-436634 573 7 M. M. NNP 10_1101-436634 573 8 J. J. NNP 10_1101-436634 574 1 & & CC 10_1101-436634 574 2 Ferrone Ferrone NNP 10_1101-436634 574 3 , , , 10_1101-436634 574 4 S. S. NNP 10_1101-436634 574 5 Antigen Antigen NNP 10_1101-436634 574 6 - - HYPH 10_1101-436634 574 7 processing process VBG 10_1101-436634 574 8 machinery machinery NN 10_1101-436634 574 9 breakdown breakdown NN 10_1101-436634 574 10 and and CC 10_1101-436634 574 11 tumor tumor NN 10_1101-436634 574 12 growth growth NN 10_1101-436634 574 13 . . . 10_1101-436634 575 1 Immunol immunol NN 10_1101-436634 575 2 . . . 10_1101-436634 576 1 Today today NN 10_1101-436634 576 2 21 21 CD 10_1101-436634 576 3 , , , 10_1101-436634 576 4 455–464 455–464 CD 10_1101-436634 576 5 ( ( -LRB- 10_1101-436634 576 6 2000 2000 CD 10_1101-436634 576 7 ) ) -RRB- 10_1101-436634 576 8 . . . 10_1101-436634 577 1 40 40 CD 10_1101-436634 577 2 . . . 10_1101-436634 578 1 Güssow Güssow NNP 10_1101-436634 578 2 , , , 10_1101-436634 578 3 D. D. NNP 10_1101-436634 578 4 et et NNP 10_1101-436634 578 5 al al NNP 10_1101-436634 578 6 . . . 10_1101-436634 579 1 The the DT 10_1101-436634 579 2 human human JJ 10_1101-436634 579 3 beta beta NN 10_1101-436634 579 4 2-microglobulin 2-microglobulin CD 10_1101-436634 579 5 gene gene NN 10_1101-436634 579 6 . . . 10_1101-436634 580 1 Primary primary JJ 10_1101-436634 580 2 structure structure NN 10_1101-436634 580 3 and and CC 10_1101-436634 580 4 definition definition NN 10_1101-436634 580 5 of of IN 10_1101-436634 580 6 the the DT 10_1101-436634 580 7 transcriptional transcriptional JJ 10_1101-436634 580 8 unit unit NN 10_1101-436634 580 9 . . . 10_1101-436634 581 1 J. J. NNP 10_1101-436634 581 2 Immunol Immunol NNP 10_1101-436634 581 3 . . . 10_1101-436634 582 1 139 139 CD 10_1101-436634 582 2 , , , 10_1101-436634 582 3 3132–3138 3132–3138 CD 10_1101-436634 582 4 ( ( -LRB- 10_1101-436634 582 5 1987 1987 CD 10_1101-436634 582 6 ) ) -RRB- 10_1101-436634 582 7 . . . 10_1101-436634 583 1 41 41 CD 10_1101-436634 583 2 . . . 10_1101-436634 584 1 Wang Wang NNP 10_1101-436634 584 2 , , , 10_1101-436634 584 3 L. L. NNP 10_1101-436634 584 4 , , , 10_1101-436634 584 5 Yin Yin NNP 10_1101-436634 584 6 , , , 10_1101-436634 584 7 W. W. NNP 10_1101-436634 584 8 & & CC 10_1101-436634 584 9 Shi Shi NNP 10_1101-436634 584 10 , , , 10_1101-436634 584 11 C. C. NNP 10_1101-436634 584 12 E3 E3 NNP 10_1101-436634 584 13 ubiquitin ubiquitin JJ 10_1101-436634 584 14 ligase ligase NN 10_1101-436634 584 15 , , , 10_1101-436634 584 16 RNF139 RNF139 NNP 10_1101-436634 584 17 , , , 10_1101-436634 584 18 inhibits inhibit VBZ 10_1101-436634 584 19 the the DT 10_1101-436634 584 20 progression progression NN 10_1101-436634 584 21 of of IN 10_1101-436634 584 22 tongue tongue NN 10_1101-436634 584 23 cancer cancer NN 10_1101-436634 584 24 . . . 10_1101-436634 585 1 BMC BMC NNP 10_1101-436634 585 2 Cancer Cancer NNP 10_1101-436634 585 3 17 17 CD 10_1101-436634 585 4 , , , 10_1101-436634 585 5 452 452 CD 10_1101-436634 585 6 ( ( -LRB- 10_1101-436634 585 7 2017 2017 CD 10_1101-436634 585 8 ) ) -RRB- 10_1101-436634 585 9 . . . 10_1101-436634 586 1 42 42 CD 10_1101-436634 586 2 . . . 10_1101-436634 587 1 Hornbeck Hornbeck NNP 10_1101-436634 587 2 , , , 10_1101-436634 587 3 P. P. NNP 10_1101-436634 587 4 V. V. NNP 10_1101-436634 587 5 et et NNP 10_1101-436634 587 6 al al NNP 10_1101-436634 587 7 . . . 10_1101-436634 588 1 PhosphoSitePlus PhosphoSitePlus NNP 10_1101-436634 588 2 , , , 10_1101-436634 588 3 2014 2014 CD 10_1101-436634 588 4 : : : 10_1101-436634 588 5 mutations mutation NNS 10_1101-436634 588 6 , , , 10_1101-436634 588 7 PTMs ptm NNS 10_1101-436634 588 8 and and CC 10_1101-436634 588 9 recalibrations recalibration NNS 10_1101-436634 588 10 . . . 10_1101-436634 589 1 Nucleic Nucleic NNP 10_1101-436634 589 2 Acids Acids NNPS 10_1101-436634 589 3 Res Res NNP 10_1101-436634 589 4 . . . 10_1101-436634 590 1 43 43 CD 10_1101-436634 590 2 , , , 10_1101-436634 590 3 D512–20 D512–20 NNP 10_1101-436634 590 4 ( ( -LRB- 10_1101-436634 590 5 2015 2015 CD 10_1101-436634 590 6 ) ) -RRB- 10_1101-436634 590 7 . . . 10_1101-436634 591 1 43 43 CD 10_1101-436634 591 2 . . . 10_1101-436634 592 1 Zhao Zhao NNP 10_1101-436634 592 2 , , , 10_1101-436634 592 3 R. R. NNP 10_1101-436634 592 4 , , , 10_1101-436634 592 5 Choi Choi NNP 10_1101-436634 592 6 , , , 10_1101-436634 592 7 B. B. NNP 10_1101-436634 592 8 Y. Y. NNP 10_1101-436634 592 9 , , , 10_1101-436634 592 10 Lee Lee NNP 10_1101-436634 592 11 , , , 10_1101-436634 592 12 M.-H. M.-H. NNP 10_1101-436634 592 13 , , , 10_1101-436634 592 14 Bode Bode NNP 10_1101-436634 592 15 , , , 10_1101-436634 592 16 A. A. NNP 10_1101-436634 592 17 M. M. NNP 10_1101-436634 592 18 & & CC 10_1101-436634 592 19 Dong Dong NNP 10_1101-436634 592 20 , , , 10_1101-436634 592 21 Z. Z. NNP 10_1101-436634 593 1 Implications implication NNS 10_1101-436634 593 2 of of IN 10_1101-436634 593 3 Genetic Genetic NNP 10_1101-436634 593 4 and and CC 10_1101-436634 593 5 Epigenetic Epigenetic NNP 10_1101-436634 593 6 Alterations Alterations NNPS 10_1101-436634 593 7 of of IN 10_1101-436634 593 8 CDKN2A CDKN2A NNP 10_1101-436634 593 9 ( ( -LRB- 10_1101-436634 593 10 p16(INK4a p16(INK4a NNP 10_1101-436634 593 11 ) ) -RRB- 10_1101-436634 593 12 ) ) -RRB- 10_1101-436634 593 13 in in IN 10_1101-436634 593 14 Cancer Cancer NNP 10_1101-436634 593 15 . . . 10_1101-436634 594 1 EBioMedicine EBioMedicine NNP 10_1101-436634 594 2 8 8 CD 10_1101-436634 594 3 , , , 10_1101-436634 594 4 30–39 30–39 CD 10_1101-436634 594 5 ( ( -LRB- 10_1101-436634 594 6 2016 2016 CD 10_1101-436634 594 7 ) ) -RRB- 10_1101-436634 594 8 . . . 10_1101-436634 595 1 44 44 CD 10_1101-436634 595 2 . . . 10_1101-436634 596 1 Gump Gump NNP 10_1101-436634 596 2 , , , 10_1101-436634 596 3 J. J. NNP 10_1101-436634 596 4 , , , 10_1101-436634 596 5 Stokoe Stokoe NNP 10_1101-436634 596 6 , , , 10_1101-436634 596 7 D. D. NNP 10_1101-436634 596 8 & & CC 10_1101-436634 596 9 McCormick McCormick NNP 10_1101-436634 596 10 , , , 10_1101-436634 596 11 F. F. NNP 10_1101-436634 596 12 Phosphorylation Phosphorylation NNP 10_1101-436634 596 13 of of IN 10_1101-436634 596 14 p16 p16 NN 10_1101-436634 596 15 INK4A ink4a NN 10_1101-436634 596 16 Correlates Correlates NNPS 10_1101-436634 596 17 with with IN 10_1101-436634 596 18 Cdk4 Cdk4 NNP 10_1101-436634 596 19 Association Association NNP 10_1101-436634 596 20 . . . 10_1101-436634 597 1 J. J. NNP 10_1101-436634 597 2 Biol Biol NNP 10_1101-436634 597 3 . . . 10_1101-436634 598 1 Chem Chem NNP 10_1101-436634 598 2 . . . 10_1101-436634 599 1 278 278 CD 10_1101-436634 599 2 , , , 10_1101-436634 599 3 6619–6622 6619–6622 CD 10_1101-436634 599 4 ( ( -LRB- 10_1101-436634 599 5 2003 2003 CD 10_1101-436634 599 6 ) ) -RRB- 10_1101-436634 599 7 . . . 10_1101-436634 600 1 45 45 CD 10_1101-436634 600 2 . . . 10_1101-436634 601 1 Quinlan Quinlan NNP 10_1101-436634 601 2 , , , 10_1101-436634 601 3 A. A. NNP 10_1101-436634 601 4 R. R. NNP 10_1101-436634 601 5 BEDTools bedtool NNS 10_1101-436634 601 6 : : : 10_1101-436634 601 7 The the DT 10_1101-436634 601 8 Swiss Swiss NNP 10_1101-436634 601 9 - - HYPH 10_1101-436634 601 10 Army Army NNP 10_1101-436634 601 11 Tool Tool NNP 10_1101-436634 601 12 for for IN 10_1101-436634 601 13 Genome Genome NNP 10_1101-436634 601 14 Feature Feature NNP 10_1101-436634 601 15 Analysis Analysis NNP 10_1101-436634 601 16 . . . 10_1101-436634 602 1 Curr curr UH 10_1101-436634 602 2 . . . 10_1101-436634 603 1 Protoc Protoc NNP 10_1101-436634 603 2 . . . 10_1101-436634 604 1 Bioinformatics Bioinformatics NNP 10_1101-436634 604 2 47 47 CD 10_1101-436634 604 3 , , , 10_1101-436634 604 4 11.12.1–34 11.12.1–34 CD 10_1101-436634 604 5 ( ( -LRB- 10_1101-436634 604 6 2014 2014 CD 10_1101-436634 604 7 ) ) -RRB- 10_1101-436634 604 8 . . . 10_1101-436634 605 1 46 46 CD 10_1101-436634 605 2 . . . 10_1101-436634 606 1 Li Li NNP 10_1101-436634 606 2 , , , 10_1101-436634 606 3 H. H. NNP 10_1101-436634 606 4 Tabix Tabix NNP 10_1101-436634 606 5 : : : 10_1101-436634 606 6 fast fast JJ 10_1101-436634 606 7 retrieval retrieval NN 10_1101-436634 606 8 of of IN 10_1101-436634 606 9 sequence sequence NN 10_1101-436634 606 10 features feature VBZ 10_1101-436634 606 11 from from IN 10_1101-436634 606 12 generic generic JJ 10_1101-436634 606 13 TAB TAB NNP 10_1101-436634 606 14 - - HYPH 10_1101-436634 606 15 delimited delimit VBN 10_1101-436634 606 16 files file NNS 10_1101-436634 606 17 . . . 10_1101-436634 607 1 Bioinformatics Bioinformatics NNP 10_1101-436634 607 2 27 27 CD 10_1101-436634 607 3 , , , 10_1101-436634 607 4 718–719 718–719 CD 10_1101-436634 607 5 ( ( -LRB- 10_1101-436634 607 6 2011 2011 CD 10_1101-436634 607 7 ) ) -RRB- 10_1101-436634 607 8 . . . 10_1101-436634 608 1 47 47 CD 10_1101-436634 608 2 . . . 10_1101-436634 609 1 GDC GDC NNP 10_1101-436634 609 2 Data Data NNP 10_1101-436634 609 3 Processing Processing NNP 10_1101-436634 609 4 . . . 10_1101-436634 610 1 https://gdc.cancer.gov/about-data/gdc-data-processing https://gdc.cancer.gov/about-data/gdc-data-processing NN 10_1101-436634 610 2 . . . 10_1101-436634 611 1 48 48 CD 10_1101-436634 611 2 . . . 10_1101-436634 612 1 Fan Fan NNP 10_1101-436634 612 2 , , , 10_1101-436634 612 3 Y. Y. NNP 10_1101-436634 612 4 et et NNP 10_1101-436634 612 5 al al NNP 10_1101-436634 612 6 . . . 10_1101-436634 613 1 Accounting account VBG 10_1101-436634 613 2 for for IN 10_1101-436634 613 3 tumor tumor NN 10_1101-436634 613 4 heterogeneity heterogeneity NN 10_1101-436634 613 5 using use VBG 10_1101-436634 613 6 a a DT 10_1101-436634 613 7 sample sample NN 10_1101-436634 613 8 - - HYPH 10_1101-436634 613 9 specific specific JJ 10_1101-436634 613 10 error error NN 10_1101-436634 613 11 model model NN 10_1101-436634 613 12 improves improve VBZ 10_1101-436634 613 13 sensitivity sensitivity NN 10_1101-436634 613 14 and and CC 10_1101-436634 613 15 specificity specificity NN 10_1101-436634 613 16 in in IN 10_1101-436634 613 17 mutation mutation NN 10_1101-436634 613 18 calling call VBG 10_1101-436634 613 19 for for IN 10_1101-436634 613 20 sequencing sequence VBG 10_1101-436634 613 21 data datum NNS 10_1101-436634 613 22 . . . 10_1101-436634 614 1 bioRxiv biorxiv IN 10_1101-436634 614 2 055467 055467 CD 10_1101-436634 614 3 ( ( -LRB- 10_1101-436634 614 4 2016 2016 CD 10_1101-436634 614 5 ) ) -RRB- 10_1101-436634 614 6 doi:10.1101/055467 doi:10.1101/055467 NN 10_1101-436634 614 7 . . . 10_1101-436634 615 1 49 49 CD 10_1101-436634 615 2 . . . 10_1101-436634 616 1 Cibulskis Cibulskis NNP 10_1101-436634 616 2 , , , 10_1101-436634 616 3 K. K. NNP 10_1101-436634 616 4 et et NNP 10_1101-436634 616 5 al al NNP 10_1101-436634 616 6 . . . 10_1101-436634 617 1 Sensitive sensitive JJ 10_1101-436634 617 2 detection detection NN 10_1101-436634 617 3 of of IN 10_1101-436634 617 4 somatic somatic JJ 10_1101-436634 617 5 point point NN 10_1101-436634 617 6 mutations mutation NNS 10_1101-436634 617 7 in in IN 10_1101-436634 617 8 impure impure NN 10_1101-436634 617 9 and and CC 10_1101-436634 617 10 heterogeneous heterogeneous JJ 10_1101-436634 617 11 cancer cancer NN 10_1101-436634 617 12 samples sample NNS 10_1101-436634 617 13 . . . 10_1101-436634 618 1 Nat Nat NNP 10_1101-436634 618 2 . . . 10_1101-436634 619 1 Biotechnol Biotechnol NNS 10_1101-436634 619 2 . . . 10_1101-436634 620 1 31 31 CD 10_1101-436634 620 2 , , , 10_1101-436634 620 3 213–219 213–219 CD 10_1101-436634 620 4 ( ( -LRB- 10_1101-436634 620 5 2013 2013 CD 10_1101-436634 620 6 ) ) -RRB- 10_1101-436634 620 7 . . . 10_1101-436634 621 1 .CC .CC NFP 10_1101-436634 621 2 - - : 10_1101-436634 621 3 BY by IN 10_1101-436634 621 4 - - HYPH 10_1101-436634 621 5 NC NC NNP 10_1101-436634 621 6 - - HYPH 10_1101-436634 621 7 ND ND NNP 10_1101-436634 621 8 4.0 4.0 CD 10_1101-436634 621 9 International international JJ 10_1101-436634 621 10 licensea licensea NNS 10_1101-436634 621 11 certified certify VBN 10_1101-436634 621 12 by by IN 10_1101-436634 621 13 peer peer NN 10_1101-436634 621 14 review review NN 10_1101-436634 621 15 ) ) -RRB- 10_1101-436634 621 16 is be VBZ 10_1101-436634 621 17 the the DT 10_1101-436634 621 18 author author NN 10_1101-436634 621 19 / / SYM 10_1101-436634 621 20 funder funder NN 10_1101-436634 621 21 , , , 10_1101-436634 621 22 who who WP 10_1101-436634 621 23 has have VBZ 10_1101-436634 621 24 granted grant VBN 10_1101-436634 621 25 bioRxiv biorxiv IN 10_1101-436634 621 26 a a DT 10_1101-436634 621 27 license license NN 10_1101-436634 621 28 to to TO 10_1101-436634 621 29 display display VB 10_1101-436634 621 30 the the DT 10_1101-436634 621 31 preprint preprint NN 10_1101-436634 621 32 in in IN 10_1101-436634 621 33 perpetuity perpetuity NN 10_1101-436634 621 34 . . . 10_1101-436634 622 1 It -PRON- PRP 10_1101-436634 622 2 is be VBZ 10_1101-436634 622 3 made make VBN 10_1101-436634 622 4 available available JJ 10_1101-436634 622 5 under under IN 10_1101-436634 622 6 The the DT 10_1101-436634 622 7 copyright copyright NN 10_1101-436634 622 8 holder holder NN 10_1101-436634 622 9 for for IN 10_1101-436634 622 10 this this DT 10_1101-436634 622 11 preprint preprint NN 10_1101-436634 622 12 ( ( -LRB- 10_1101-436634 622 13 which which WDT 10_1101-436634 622 14 was be VBD 10_1101-436634 622 15 notthis notthis DT 10_1101-436634 622 16 version version NN 10_1101-436634 622 17 posted post VBN 10_1101-436634 622 18 January January NNP 10_1101-436634 622 19 5 5 CD 10_1101-436634 622 20 , , , 10_1101-436634 622 21 2021 2021 CD 10_1101-436634 622 22 . . . 10_1101-436634 622 23 ; ; : 10_1101-436634 622 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 622 25 : : : 10_1101-436634 622 26 bioRxiv biorxiv VB 10_1101-436634 622 27 preprint preprint NN 10_1101-436634 622 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 622 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 622 30 22 22 CD 10_1101-436634 622 31 50 50 CD 10_1101-436634 622 32 . . . 10_1101-436634 623 1 Koboldt Koboldt NNP 10_1101-436634 623 2 , , , 10_1101-436634 623 3 D. D. NNP 10_1101-436634 623 4 C. C. NNP 10_1101-436634 623 5 et et NNP 10_1101-436634 623 6 al al NNP 10_1101-436634 623 7 . . . 10_1101-436634 624 1 VarScan VarScan NNP 10_1101-436634 624 2 2 2 CD 10_1101-436634 624 3 : : : 10_1101-436634 624 4 somatic somatic JJ 10_1101-436634 624 5 mutation mutation NN 10_1101-436634 624 6 and and CC 10_1101-436634 624 7 copy copy NN 10_1101-436634 624 8 number number NN 10_1101-436634 624 9 alteration alteration NN 10_1101-436634 624 10 discovery discovery NN 10_1101-436634 624 11 in in IN 10_1101-436634 624 12 cancer cancer NN 10_1101-436634 624 13 by by IN 10_1101-436634 624 14 exome exome NNP 10_1101-436634 624 15 sequencing sequencing NN 10_1101-436634 624 16 . . . 10_1101-436634 625 1 Genome Genome NNP 10_1101-436634 625 2 Res Res NNP 10_1101-436634 625 3 . . . 10_1101-436634 626 1 22 22 CD 10_1101-436634 626 2 , , , 10_1101-436634 626 3 568–576 568–576 CD 10_1101-436634 626 4 ( ( -LRB- 10_1101-436634 626 5 2012 2012 CD 10_1101-436634 626 6 ) ) -RRB- 10_1101-436634 626 7 . . . 10_1101-436634 627 1 51 51 CD 10_1101-436634 627 2 . . . 10_1101-436634 628 1 Larson Larson NNP 10_1101-436634 628 2 , , , 10_1101-436634 628 3 D. D. NNP 10_1101-436634 628 4 E. E. NNP 10_1101-436634 628 5 et et NNP 10_1101-436634 628 6 al al NNP 10_1101-436634 628 7 . . . 10_1101-436634 629 1 SomaticSniper SomaticSniper NNP 10_1101-436634 629 2 : : : 10_1101-436634 629 3 identification identification NN 10_1101-436634 629 4 of of IN 10_1101-436634 629 5 somatic somatic JJ 10_1101-436634 629 6 point point NN 10_1101-436634 629 7 mutations mutation NNS 10_1101-436634 629 8 in in IN 10_1101-436634 629 9 whole whole JJ 10_1101-436634 629 10 genome genome JJ 10_1101-436634 629 11 sequencing sequencing NN 10_1101-436634 629 12 data datum NNS 10_1101-436634 629 13 . . . 10_1101-436634 630 1 Bioinformatics Bioinformatics NNP 10_1101-436634 630 2 28 28 CD 10_1101-436634 630 3 , , , 10_1101-436634 630 4 311–317 311–317 CD 10_1101-436634 630 5 ( ( -LRB- 10_1101-436634 630 6 2012 2012 CD 10_1101-436634 630 7 ) ) -RRB- 10_1101-436634 630 8 . . . 10_1101-436634 631 1 52 52 CD 10_1101-436634 631 2 . . . 10_1101-436634 632 1 Griffith Griffith NNP 10_1101-436634 632 2 , , , 10_1101-436634 632 3 M. M. NNP 10_1101-436634 632 4 et et NNP 10_1101-436634 632 5 al al NNP 10_1101-436634 632 6 . . . 10_1101-436634 633 1 Genome Genome NNP 10_1101-436634 633 2 Modeling modeling NN 10_1101-436634 633 3 System system NN 10_1101-436634 633 4 : : : 10_1101-436634 633 5 A a DT 10_1101-436634 633 6 Knowledge Knowledge NNP 10_1101-436634 633 7 Management Management NNP 10_1101-436634 633 8 Platform Platform NNP 10_1101-436634 633 9 for for IN 10_1101-436634 633 10 Genomics Genomics NNP 10_1101-436634 633 11 . . . 10_1101-436634 634 1 PLoS PLoS : 10_1101-436634 634 2 Comput Comput NNP 10_1101-436634 634 3 . . . 10_1101-436634 635 1 Biol Biol NNP 10_1101-436634 635 2 . . . 10_1101-436634 636 1 11 11 CD 10_1101-436634 636 2 , , , 10_1101-436634 636 3 e1004274 e1004274 NNP 10_1101-436634 636 4 ( ( -LRB- 10_1101-436634 636 5 2015 2015 CD 10_1101-436634 636 6 ) ) -RRB- 10_1101-436634 636 7 . . . 10_1101-436634 637 1 53 53 CD 10_1101-436634 637 2 . . . 10_1101-436634 638 1 Li Li NNP 10_1101-436634 638 2 , , , 10_1101-436634 638 3 H. H. NNP 10_1101-436634 638 4 & & CC 10_1101-436634 638 5 Durbin Durbin NNP 10_1101-436634 638 6 , , , 10_1101-436634 638 7 R. R. NNP 10_1101-436634 638 8 Fast Fast NNP 10_1101-436634 638 9 and and CC 10_1101-436634 638 10 accurate accurate JJ 10_1101-436634 638 11 short short JJ 10_1101-436634 638 12 read read NN 10_1101-436634 638 13 alignment alignment NN 10_1101-436634 638 14 with with IN 10_1101-436634 638 15 Burrows Burrows NNP 10_1101-436634 638 16 - - HYPH 10_1101-436634 638 17 Wheeler Wheeler NNP 10_1101-436634 638 18 transform transform NN 10_1101-436634 638 19 . . . 10_1101-436634 639 1 Bioinformatics Bioinformatics NNP 10_1101-436634 639 2 25 25 CD 10_1101-436634 639 3 , , , 10_1101-436634 639 4 1754–1760 1754–1760 CD 10_1101-436634 639 5 ( ( -LRB- 10_1101-436634 639 6 2009 2009 CD 10_1101-436634 639 7 ) ) -RRB- 10_1101-436634 639 8 . . . 10_1101-436634 640 1 54 54 CD 10_1101-436634 640 2 . . . 10_1101-436634 641 1 Saunders Saunders NNP 10_1101-436634 641 2 , , , 10_1101-436634 641 3 C. C. NNP 10_1101-436634 641 4 T. T. NNP 10_1101-436634 641 5 et et NNP 10_1101-436634 641 6 al al NNP 10_1101-436634 641 7 . . . 10_1101-436634 642 1 Strelka strelka NN 10_1101-436634 642 2 : : : 10_1101-436634 642 3 accurate accurate JJ 10_1101-436634 642 4 somatic somatic JJ 10_1101-436634 642 5 small small JJ 10_1101-436634 642 6 - - HYPH 10_1101-436634 642 7 variant variant JJ 10_1101-436634 642 8 calling calling NN 10_1101-436634 642 9 from from IN 10_1101-436634 642 10 sequenced sequenced JJ 10_1101-436634 642 11 tumor tumor NN 10_1101-436634 642 12 – – : 10_1101-436634 642 13 normal normal JJ 10_1101-436634 642 14 sample sample NN 10_1101-436634 642 15 pairs pair NNS 10_1101-436634 642 16 . . . 10_1101-436634 643 1 Bioinformatics Bioinformatics NNP 10_1101-436634 643 2 28 28 CD 10_1101-436634 643 3 , , , 10_1101-436634 643 4 1811–1817 1811–1817 CD 10_1101-436634 643 5 ( ( -LRB- 10_1101-436634 643 6 2012 2012 CD 10_1101-436634 643 7 ) ) -RRB- 10_1101-436634 643 8 . . . 10_1101-436634 644 1 .CC .CC NFP 10_1101-436634 644 2 - - : 10_1101-436634 644 3 BY by IN 10_1101-436634 644 4 - - HYPH 10_1101-436634 644 5 NC NC NNP 10_1101-436634 644 6 - - HYPH 10_1101-436634 644 7 ND ND NNP 10_1101-436634 644 8 4.0 4.0 CD 10_1101-436634 644 9 International international JJ 10_1101-436634 644 10 licensea licensea NNS 10_1101-436634 644 11 certified certify VBN 10_1101-436634 644 12 by by IN 10_1101-436634 644 13 peer peer NN 10_1101-436634 644 14 review review NN 10_1101-436634 644 15 ) ) -RRB- 10_1101-436634 644 16 is be VBZ 10_1101-436634 644 17 the the DT 10_1101-436634 644 18 author author NN 10_1101-436634 644 19 / / SYM 10_1101-436634 644 20 funder funder NN 10_1101-436634 644 21 , , , 10_1101-436634 644 22 who who WP 10_1101-436634 644 23 has have VBZ 10_1101-436634 644 24 granted grant VBN 10_1101-436634 644 25 bioRxiv biorxiv IN 10_1101-436634 644 26 a a DT 10_1101-436634 644 27 license license NN 10_1101-436634 644 28 to to TO 10_1101-436634 644 29 display display VB 10_1101-436634 644 30 the the DT 10_1101-436634 644 31 preprint preprint NN 10_1101-436634 644 32 in in IN 10_1101-436634 644 33 perpetuity perpetuity NN 10_1101-436634 644 34 . . . 10_1101-436634 645 1 It -PRON- PRP 10_1101-436634 645 2 is be VBZ 10_1101-436634 645 3 made make VBN 10_1101-436634 645 4 available available JJ 10_1101-436634 645 5 under under IN 10_1101-436634 645 6 The the DT 10_1101-436634 645 7 copyright copyright NN 10_1101-436634 645 8 holder holder NN 10_1101-436634 645 9 for for IN 10_1101-436634 645 10 this this DT 10_1101-436634 645 11 preprint preprint NN 10_1101-436634 645 12 ( ( -LRB- 10_1101-436634 645 13 which which WDT 10_1101-436634 645 14 was be VBD 10_1101-436634 645 15 notthis notthis DT 10_1101-436634 645 16 version version NN 10_1101-436634 645 17 posted post VBN 10_1101-436634 645 18 January January NNP 10_1101-436634 645 19 5 5 CD 10_1101-436634 645 20 , , , 10_1101-436634 645 21 2021 2021 CD 10_1101-436634 645 22 . . . 10_1101-436634 645 23 ; ; : 10_1101-436634 645 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 645 25 : : : 10_1101-436634 645 26 bioRxiv biorxiv VB 10_1101-436634 645 27 preprint preprint NN 10_1101-436634 645 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 645 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 645 30 23 23 CD 10_1101-436634 645 31 Main Main NNP 10_1101-436634 645 32 Figures Figures NNPS 10_1101-436634 645 33 Figure figure NN 10_1101-436634 645 34 1 1 CD 10_1101-436634 645 35 : : : 10_1101-436634 645 36 Flexible flexible JJ 10_1101-436634 645 37 , , , 10_1101-436634 645 38 streamlined streamlined JJ 10_1101-436634 645 39 discovery discovery NN 10_1101-436634 645 40 of of IN 10_1101-436634 645 41 cis cis NN 10_1101-436634 645 42 - - HYPH 10_1101-436634 645 43 acting act VBG 10_1101-436634 645 44 splice splice NN 10_1101-436634 645 45 variants variant NNS 10_1101-436634 645 46 with with IN 10_1101-436634 645 47 RegTools RegTools NNP 10_1101-436634 645 48 modules module NNS 10_1101-436634 645 49 and and CC 10_1101-436634 645 50 cis cis NN 10_1101-436634 645 51 - - HYPH 10_1101-436634 645 52 splice splice NN 10_1101-436634 645 53 - - HYPH 10_1101-436634 645 54 effects effect NNS 10_1101-436634 645 55 identify identify VBP 10_1101-436634 645 56 workflow workflow NN 10_1101-436634 645 57 . . . 10_1101-436634 646 1 A a LS 10_1101-436634 646 2 ) ) -RRB- 10_1101-436634 646 3 By by IN 10_1101-436634 646 4 default default NN 10_1101-436634 646 5 , , , 10_1101-436634 646 6 variants variant NNS 10_1101-436634 646 7 annotate annotate VBP 10_1101-436634 646 8 marks mark NNS 10_1101-436634 646 9 variants variant NNS 10_1101-436634 646 10 within within IN 10_1101-436634 646 11 3bp 3bp NN 10_1101-436634 646 12 on on IN 10_1101-436634 646 13 the the DT 10_1101-436634 646 14 exonic exonic JJ 10_1101-436634 646 15 side side NN 10_1101-436634 646 16 and and CC 10_1101-436634 646 17 2bp 2bp NN 10_1101-436634 646 18 on on IN 10_1101-436634 646 19 the the DT 10_1101-436634 646 20 intronic intronic JJ 10_1101-436634 646 21 side side NN 10_1101-436634 646 22 of of IN 10_1101-436634 646 23 an an DT 10_1101-436634 646 24 exon exon JJ 10_1101-436634 646 25 edge edge NN 10_1101-436634 646 26 as as IN 10_1101-436634 646 27 potentially potentially RB 10_1101-436634 646 28 splicing splice VBG 10_1101-436634 646 29 - - HYPH 10_1101-436634 646 30 relevant relevant JJ 10_1101-436634 646 31 . . . 10_1101-436634 647 1 This this DT 10_1101-436634 647 2 “ " `` 10_1101-436634 647 3 splice splice NN 10_1101-436634 647 4 variant variant JJ 10_1101-436634 647 5 window window NN 10_1101-436634 647 6 ” " '' 10_1101-436634 647 7 can can MD 10_1101-436634 647 8 be be VB 10_1101-436634 647 9 modified modify VBN 10_1101-436634 647 10 individually individually RB 10_1101-436634 647 11 for for IN 10_1101-436634 647 12 the the DT 10_1101-436634 647 13 exonic exonic JJ 10_1101-436634 647 14 side side NN 10_1101-436634 647 15 and and CC 10_1101-436634 647 16 intronic intronic JJ 10_1101-436634 647 17 side side NN 10_1101-436634 647 18 using use VBG 10_1101-436634 647 19 the the DT 10_1101-436634 647 20 “ " `` 10_1101-436634 647 21 -e -e FW 10_1101-436634 647 22 ” " '' 10_1101-436634 647 23 and and CC 10_1101-436634 647 24 “ " `` 10_1101-436634 647 25 -i -i : 10_1101-436634 647 26 ” " '' 10_1101-436634 647 27 options option NNS 10_1101-436634 647 28 , , , 10_1101-436634 647 29 respectively respectively RB 10_1101-436634 647 30 . . . 10_1101-436634 648 1 With with IN 10_1101-436634 648 2 cis cis NN 10_1101-436634 648 3 - - HYPH 10_1101-436634 648 4 splice splice NN 10_1101-436634 648 5 - - HYPH 10_1101-436634 648 6 effects effect NNS 10_1101-436634 648 7 identify identify VBP 10_1101-436634 648 8 , , , 10_1101-436634 648 9 for for IN 10_1101-436634 648 10 each each DT 10_1101-436634 648 11 variant variant NN 10_1101-436634 648 12 in in IN 10_1101-436634 648 13 the the DT 10_1101-436634 648 14 splice splice NN 10_1101-436634 648 15 variant variant JJ 10_1101-436634 648 16 window window NN 10_1101-436634 648 17 , , , 10_1101-436634 648 18 a a DT 10_1101-436634 648 19 “ " `` 10_1101-436634 648 20 splice splice NN 10_1101-436634 648 21 junction junction NN 10_1101-436634 648 22 region region NN 10_1101-436634 648 23 ” " '' 10_1101-436634 648 24 is be VBZ 10_1101-436634 648 25 determined determine VBN 10_1101-436634 648 26 by by IN 10_1101-436634 648 27 finding find VBG 10_1101-436634 648 28 the the DT 10_1101-436634 648 29 largest large JJS 10_1101-436634 648 30 span span NN 10_1101-436634 648 31 of of IN 10_1101-436634 648 32 sequence sequence NN 10_1101-436634 648 33 space space NN 10_1101-436634 648 34 between between IN 10_1101-436634 648 35 exons exon NNS 10_1101-436634 648 36 which which WDT 10_1101-436634 648 37 flank flank VBP 10_1101-436634 648 38 the the DT 10_1101-436634 648 39 exon exon NNS 10_1101-436634 648 40 associated associate VBN 10_1101-436634 648 41 with with IN 10_1101-436634 648 42 the the DT 10_1101-436634 648 43 splicing splicing NN 10_1101-436634 648 44 - - HYPH 10_1101-436634 648 45 relevant relevant JJ 10_1101-436634 648 46 variant variant NN 10_1101-436634 648 47 . . . 10_1101-436634 649 1 The the DT 10_1101-436634 649 2 splice splice NN 10_1101-436634 649 3 junction junction NN 10_1101-436634 649 4 region region NN 10_1101-436634 649 5 can can MD 10_1101-436634 649 6 also also RB 10_1101-436634 649 7 be be VB 10_1101-436634 649 8 set set VBN 10_1101-436634 649 9 manually manually RB 10_1101-436634 649 10 to to TO 10_1101-436634 649 11 contain contain VB 10_1101-436634 649 12 the the DT 10_1101-436634 649 13 entire entire JJ 10_1101-436634 649 14 sequence sequence NN 10_1101-436634 649 15 space space NN 10_1101-436634 649 16 n n CC 10_1101-436634 649 17 bases basis NNS 10_1101-436634 649 18 upstream upstream JJ 10_1101-436634 649 19 and and CC 10_1101-436634 649 20 downstream downstream JJ 10_1101-436634 649 21 of of IN 10_1101-436634 649 22 the the DT 10_1101-436634 649 23 variant variant NN 10_1101-436634 649 24 using use VBG 10_1101-436634 649 25 the the DT 10_1101-436634 649 26 “ " `` 10_1101-436634 649 27 -w -w NN 10_1101-436634 649 28 ” " '' 10_1101-436634 649 29 option option NN 10_1101-436634 649 30 . . . 10_1101-436634 650 1 Junctions junction NNS 10_1101-436634 650 2 overlapping overlap VBG 10_1101-436634 650 3 the the DT 10_1101-436634 650 4 splice splice NN 10_1101-436634 650 5 junction junction NN 10_1101-436634 650 6 region region NN 10_1101-436634 650 7 are be VBP 10_1101-436634 650 8 associated associate VBN 10_1101-436634 650 9 with with IN 10_1101-436634 650 10 the the DT 10_1101-436634 650 11 variant variant NN 10_1101-436634 650 12 . . . 10_1101-436634 651 1 Using use VBG 10_1101-436634 651 2 the the DT 10_1101-436634 651 3 -E -E HYPH 10_1101-436634 651 4 option option NN 10_1101-436634 651 5 considers consider VBZ 10_1101-436634 651 6 all all DT 10_1101-436634 651 7 exonic exonic JJ 10_1101-436634 651 8 variants variant NNS 10_1101-436634 651 9 as as IN 10_1101-436634 651 10 potentially potentially RB 10_1101-436634 651 11 splicing splice VBG 10_1101-436634 651 12 - - HYPH 10_1101-436634 651 13 relevant relevant JJ 10_1101-436634 651 14 , , , 10_1101-436634 651 15 but but CC 10_1101-436634 651 16 is be VBZ 10_1101-436634 651 17 otherwise otherwise RB 10_1101-436634 651 18 the the DT 10_1101-436634 651 19 same same JJ 10_1101-436634 651 20 . . . 10_1101-436634 652 1 The the DT 10_1101-436634 652 2 -I -I NNP 10_1101-436634 652 3 option option NN 10_1101-436634 652 4 considers consider VBZ 10_1101-436634 652 5 all all DT 10_1101-436634 652 6 intronic intronic JJ 10_1101-436634 652 7 variants variant NNS 10_1101-436634 652 8 and and CC 10_1101-436634 652 9 also also RB 10_1101-436634 652 10 limits limit VBZ 10_1101-436634 652 11 the the DT 10_1101-436634 652 12 splice splice NN 10_1101-436634 652 13 junction junction NN 10_1101-436634 652 14 region region NN 10_1101-436634 652 15 to to IN 10_1101-436634 652 16 the the DT 10_1101-436634 652 17 intronic intronic JJ 10_1101-436634 652 18 region region NN 10_1101-436634 652 19 in in IN 10_1101-436634 652 20 which which WDT 10_1101-436634 652 21 the the DT 10_1101-436634 652 22 variant variant NN 10_1101-436634 652 23 is be VBZ 10_1101-436634 652 24 found find VBN 10_1101-436634 652 25 , , , 10_1101-436634 652 26 excluding exclude VBG 10_1101-436634 652 27 the the DT 10_1101-436634 652 28 flanking flank VBG 10_1101-436634 652 29 exons exon NNS 10_1101-436634 652 30 . . . 10_1101-436634 653 1 B B NNP 10_1101-436634 653 2 ) ) -RRB- 10_1101-436634 653 3 Cis Cis NNP 10_1101-436634 653 4 - - HYPH 10_1101-436634 653 5 splice splice NN 10_1101-436634 653 6 - - HYPH 10_1101-436634 653 7 effects effect NNS 10_1101-436634 653 8 identify identify VBP 10_1101-436634 653 9 and and CC 10_1101-436634 653 10 the the DT 10_1101-436634 653 11 underlying underlie VBG 10_1101-436634 653 12 junctions junction NNS 10_1101-436634 653 13 annotate annotate JJ 10_1101-436634 653 14 command command NN 10_1101-436634 653 15 annotate annotate NN 10_1101-436634 653 16 splicing splicing NN 10_1101-436634 653 17 events event NNS 10_1101-436634 653 18 based base VBN 10_1101-436634 653 19 on on IN 10_1101-436634 653 20 whether whether IN 10_1101-436634 653 21 the the DT 10_1101-436634 653 22 donor donor NN 10_1101-436634 653 23 and and CC 10_1101-436634 653 24 acceptor acceptor NN 10_1101-436634 653 25 site site NN 10_1101-436634 653 26 combination combination NN 10_1101-436634 653 27 is be VBZ 10_1101-436634 653 28 found find VBN 10_1101-436634 653 29 in in IN 10_1101-436634 653 30 the the DT 10_1101-436634 653 31 reference reference NN 10_1101-436634 653 32 transcriptome transcriptome DT 10_1101-436634 653 33 GTF GTF NNP 10_1101-436634 653 34 . . . 10_1101-436634 654 1 In in IN 10_1101-436634 654 2 this this DT 10_1101-436634 654 3 example example NN 10_1101-436634 654 4 , , , 10_1101-436634 654 5 there there EX 10_1101-436634 654 6 are be VBP 10_1101-436634 654 7 two two CD 10_1101-436634 654 8 known know VBN 10_1101-436634 654 9 transcripts transcript NNS 10_1101-436634 654 10 ( ( -LRB- 10_1101-436634 654 11 shown show VBN 10_1101-436634 654 12 in in IN 10_1101-436634 654 13 blue blue NNP 10_1101-436634 654 14 ) ) -RRB- 10_1101-436634 654 15 which which WDT 10_1101-436634 654 16 overlap overlap VBP 10_1101-436634 654 17 a a DT 10_1101-436634 654 18 set set NN 10_1101-436634 654 19 of of IN 10_1101-436634 654 20 junctions junction NNS 10_1101-436634 654 21 from from IN 10_1101-436634 654 22 RNAseq RNAseq NNP 10_1101-436634 654 23 data datum NNS 10_1101-436634 654 24 ( ( -LRB- 10_1101-436634 654 25 depicted depict VBN 10_1101-436634 654 26 as as IN 10_1101-436634 654 27 junction junction NN 10_1101-436634 654 28 supporting support VBG 10_1101-436634 654 29 reads read NNS 10_1101-436634 654 30 .CC .CC : 10_1101-436634 654 31 - - : 10_1101-436634 654 32 BY by IN 10_1101-436634 654 33 - - HYPH 10_1101-436634 654 34 NC NC NNP 10_1101-436634 654 35 - - HYPH 10_1101-436634 654 36 ND ND NNP 10_1101-436634 654 37 4.0 4.0 CD 10_1101-436634 654 38 International international JJ 10_1101-436634 654 39 licensea licensea NNS 10_1101-436634 654 40 certified certify VBN 10_1101-436634 654 41 by by IN 10_1101-436634 654 42 peer peer NN 10_1101-436634 654 43 review review NN 10_1101-436634 654 44 ) ) -RRB- 10_1101-436634 654 45 is be VBZ 10_1101-436634 654 46 the the DT 10_1101-436634 654 47 author author NN 10_1101-436634 654 48 / / SYM 10_1101-436634 654 49 funder funder NN 10_1101-436634 654 50 , , , 10_1101-436634 654 51 who who WP 10_1101-436634 654 52 has have VBZ 10_1101-436634 654 53 granted grant VBN 10_1101-436634 654 54 bioRxiv biorxiv IN 10_1101-436634 654 55 a a DT 10_1101-436634 654 56 license license NN 10_1101-436634 654 57 to to TO 10_1101-436634 654 58 display display VB 10_1101-436634 654 59 the the DT 10_1101-436634 654 60 preprint preprint NN 10_1101-436634 654 61 in in IN 10_1101-436634 654 62 perpetuity perpetuity NN 10_1101-436634 654 63 . . . 10_1101-436634 655 1 It -PRON- PRP 10_1101-436634 655 2 is be VBZ 10_1101-436634 655 3 made make VBN 10_1101-436634 655 4 available available JJ 10_1101-436634 655 5 under under IN 10_1101-436634 655 6 The the DT 10_1101-436634 655 7 copyright copyright NN 10_1101-436634 655 8 holder holder NN 10_1101-436634 655 9 for for IN 10_1101-436634 655 10 this this DT 10_1101-436634 655 11 preprint preprint NN 10_1101-436634 655 12 ( ( -LRB- 10_1101-436634 655 13 which which WDT 10_1101-436634 655 14 was be VBD 10_1101-436634 655 15 notthis notthis DT 10_1101-436634 655 16 version version NN 10_1101-436634 655 17 posted post VBN 10_1101-436634 655 18 January January NNP 10_1101-436634 655 19 5 5 CD 10_1101-436634 655 20 , , , 10_1101-436634 655 21 2021 2021 CD 10_1101-436634 655 22 . . . 10_1101-436634 655 23 ; ; : 10_1101-436634 655 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 655 25 : : : 10_1101-436634 655 26 bioRxiv biorxiv VB 10_1101-436634 655 27 preprint preprint NN 10_1101-436634 655 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 655 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB 10_1101-436634 655 30 24 24 CD 10_1101-436634 655 31 in in IN 10_1101-436634 655 32 red red NN 10_1101-436634 655 33 ) ) -RRB- 10_1101-436634 655 34 . . . 10_1101-436634 656 1 Comparing compare VBG 10_1101-436634 656 2 the the DT 10_1101-436634 656 3 observed observed JJ 10_1101-436634 656 4 junctions junction NNS 10_1101-436634 656 5 to to IN 10_1101-436634 656 6 the the DT 10_1101-436634 656 7 reference reference NN 10_1101-436634 656 8 junctions junction NNS 10_1101-436634 656 9 in in IN 10_1101-436634 656 10 the the DT 10_1101-436634 656 11 first first JJ 10_1101-436634 656 12 transcript transcript NN 10_1101-436634 656 13 ( ( -LRB- 10_1101-436634 656 14 top top JJ 10_1101-436634 656 15 panel panel NN 10_1101-436634 656 16 ) ) -RRB- 10_1101-436634 656 17 , , , 10_1101-436634 656 18 RegTools RegTools NNP 10_1101-436634 656 19 checks check NNS 10_1101-436634 656 20 to to TO 10_1101-436634 656 21 see see VB 10_1101-436634 656 22 if if IN 10_1101-436634 656 23 the the DT 10_1101-436634 656 24 observed observed JJ 10_1101-436634 656 25 donor donor NN 10_1101-436634 656 26 and and CC 10_1101-436634 656 27 acceptor acceptor NN 10_1101-436634 656 28 splice splice NN 10_1101-436634 656 29 sites site NNS 10_1101-436634 656 30 are be VBP 10_1101-436634 656 31 found find VBN 10_1101-436634 656 32 in in IN 10_1101-436634 656 33 any any DT 10_1101-436634 656 34 of of IN 10_1101-436634 656 35 the the DT 10_1101-436634 656 36 reference reference NN 10_1101-436634 656 37 exons exon NNS 10_1101-436634 656 38 and and CC 10_1101-436634 656 39 also also RB 10_1101-436634 656 40 counts count VBZ 10_1101-436634 656 41 the the DT 10_1101-436634 656 42 number number NN 10_1101-436634 656 43 of of IN 10_1101-436634 656 44 exons exon NNS 10_1101-436634 656 45 , , , 10_1101-436634 656 46 acceptors acceptor NNS 10_1101-436634 656 47 , , , 10_1101-436634 656 48 and and CC 10_1101-436634 656 49 donors donor NNS 10_1101-436634 656 50 skipped skip VBN 10_1101-436634 656 51 by by IN 10_1101-436634 656 52 a a DT 10_1101-436634 656 53 particular particular JJ 10_1101-436634 656 54 junction junction NN 10_1101-436634 656 55 . . . 10_1101-436634 657 1 Double double JJ 10_1101-436634 657 2 arrows arrow NNS 10_1101-436634 657 3 represent represent VBP 10_1101-436634 657 4 matches match NNS 10_1101-436634 657 5 between between IN 10_1101-436634 657 6 observed observed JJ 10_1101-436634 657 7 and and CC 10_1101-436634 657 8 reference reference NN 10_1101-436634 657 9 acceptor acceptor NN 10_1101-436634 657 10 / / SYM 10_1101-436634 657 11 donor donor NN 10_1101-436634 657 12 sites site NNS 10_1101-436634 657 13 while while IN 10_1101-436634 657 14 single single JJ 10_1101-436634 657 15 arrows arrow NNS 10_1101-436634 657 16 show show VBP 10_1101-436634 657 17 novel novel JJ 10_1101-436634 657 18 splice splice NN 10_1101-436634 657 19 sites site NNS 10_1101-436634 657 20 . . . 10_1101-436634 658 1 These these DT 10_1101-436634 658 2 steps step NNS 10_1101-436634 658 3 are be VBP 10_1101-436634 658 4 repeated repeat VBN 10_1101-436634 658 5 for for IN 10_1101-436634 658 6 the the DT 10_1101-436634 658 7 rest rest NN 10_1101-436634 658 8 of of IN 10_1101-436634 658 9 the the DT 10_1101-436634 658 10 relevant relevant JJ 10_1101-436634 658 11 transcripts transcript NNS 10_1101-436634 658 12 , , , 10_1101-436634 658 13 keeping keep VBG 10_1101-436634 658 14 track track NN 10_1101-436634 658 15 of of IN 10_1101-436634 658 16 whether whether IN 10_1101-436634 658 17 there there EX 10_1101-436634 658 18 are be VBP 10_1101-436634 658 19 known know VBN 10_1101-436634 658 20 acceptor acceptor NN 10_1101-436634 658 21 - - HYPH 10_1101-436634 658 22 donor donor NN 10_1101-436634 658 23 combinations combination NNS 10_1101-436634 658 24 . . . 10_1101-436634 659 1 Junctions junction NNS 10_1101-436634 659 2 with with IN 10_1101-436634 659 3 a a DT 10_1101-436634 659 4 known know VBN 10_1101-436634 659 5 donor donor NN 10_1101-436634 659 6 but but CC 10_1101-436634 659 7 novel novel JJ 10_1101-436634 659 8 acceptor acceptor NN 10_1101-436634 659 9 or or CC 10_1101-436634 659 10 vice vice NN 10_1101-436634 659 11 - - HYPH 10_1101-436634 659 12 versa versa JJ 10_1101-436634 659 13 are be VBP 10_1101-436634 659 14 annotated annotate VBN 10_1101-436634 659 15 as as IN 10_1101-436634 659 16 “ " `` 10_1101-436634 659 17 D D NNP 10_1101-436634 659 18 ” " '' 10_1101-436634 659 19 or or CC 10_1101-436634 659 20 “ " `` 10_1101-436634 659 21 A a NN 10_1101-436634 659 22 ” " '' 10_1101-436634 659 23 , , , 10_1101-436634 659 24 respectively respectively RB 10_1101-436634 659 25 . . . 10_1101-436634 660 1 If if IN 10_1101-436634 660 2 both both DT 10_1101-436634 660 3 sites site NNS 10_1101-436634 660 4 are be VBP 10_1101-436634 660 5 known know VBN 10_1101-436634 660 6 but but CC 10_1101-436634 660 7 do do VBP 10_1101-436634 660 8 not not RB 10_1101-436634 660 9 appear appear VB 10_1101-436634 660 10 in in IN 10_1101-436634 660 11 combination combination NN 10_1101-436634 660 12 in in IN 10_1101-436634 660 13 any any DT 10_1101-436634 660 14 transcripts transcript NNS 10_1101-436634 660 15 , , , 10_1101-436634 660 16 the the DT 10_1101-436634 660 17 junction junction NN 10_1101-436634 660 18 is be VBZ 10_1101-436634 660 19 annotated annotate VBN 10_1101-436634 660 20 as as IN 10_1101-436634 660 21 “ " `` 10_1101-436634 660 22 NDA NDA NNP 10_1101-436634 660 23 ” " '' 10_1101-436634 660 24 , , , 10_1101-436634 660 25 whereas whereas IN 10_1101-436634 660 26 if if IN 10_1101-436634 660 27 both both DT 10_1101-436634 660 28 sites site NNS 10_1101-436634 660 29 are be VBP 10_1101-436634 660 30 unknown unknown JJ 10_1101-436634 660 31 , , , 10_1101-436634 660 32 the the DT 10_1101-436634 660 33 junction junction NN 10_1101-436634 660 34 is be VBZ 10_1101-436634 660 35 annotated annotate VBN 10_1101-436634 660 36 as as IN 10_1101-436634 660 37 “ " `` 10_1101-436634 660 38 N N NNP 10_1101-436634 660 39 ” " '' 10_1101-436634 660 40 . . . 10_1101-436634 661 1 If if IN 10_1101-436634 661 2 the the DT 10_1101-436634 661 3 junction junction NN 10_1101-436634 661 4 is be VBZ 10_1101-436634 661 5 known know VBN 10_1101-436634 661 6 to to IN 10_1101-436634 661 7 the the DT 10_1101-436634 661 8 reference reference NN 10_1101-436634 661 9 GTF gtf NN 10_1101-436634 661 10 , , , 10_1101-436634 661 11 it -PRON- PRP 10_1101-436634 661 12 is be VBZ 10_1101-436634 661 13 marked mark VBN 10_1101-436634 661 14 as as IN 10_1101-436634 661 15 “ " `` 10_1101-436634 661 16 DA DA NNP 10_1101-436634 661 17 ” " '' 10_1101-436634 661 18 . . . 10_1101-436634 662 1 C C NNP 10_1101-436634 662 2 ) ) -RRB- 10_1101-436634 662 3 The the DT 10_1101-436634 662 4 cis cis NN 10_1101-436634 662 5 - - HYPH 10_1101-436634 662 6 splice splice NN 10_1101-436634 662 7 - - HYPH 10_1101-436634 662 8 effects effect NNS 10_1101-436634 662 9 identify identify VBP 10_1101-436634 662 10 command command NN 10_1101-436634 662 11 relies rely VBZ 10_1101-436634 662 12 on on IN 10_1101-436634 662 13 the the DT 10_1101-436634 662 14 variants variant NNS 10_1101-436634 662 15 annotate annotate VBP 10_1101-436634 662 16 , , , 10_1101-436634 662 17 junctions junction NNS 10_1101-436634 662 18 extract extract NN 10_1101-436634 662 19 , , , 10_1101-436634 662 20 and and CC 10_1101-436634 662 21 junctions junction NNS 10_1101-436634 662 22 annotate annotate JJ 10_1101-436634 662 23 submodules submodule NNS 10_1101-436634 662 24 . . . 10_1101-436634 663 1 This this DT 10_1101-436634 663 2 pipeline pipeline NN 10_1101-436634 663 3 takes take VBZ 10_1101-436634 663 4 variant variant JJ 10_1101-436634 663 5 calls call NNS 10_1101-436634 663 6 and and CC 10_1101-436634 663 7 RNA RNA NNP 10_1101-436634 663 8 - - HYPH 10_1101-436634 663 9 seq seq NN 10_1101-436634 663 10 alignments alignment NNS 10_1101-436634 663 11 along along IN 10_1101-436634 663 12 with with IN 10_1101-436634 663 13 genome genome JJ 10_1101-436634 663 14 and and CC 10_1101-436634 663 15 transcriptome transcriptome DT 10_1101-436634 663 16 references reference NNS 10_1101-436634 663 17 and and CC 10_1101-436634 663 18 outputs output NNS 10_1101-436634 663 19 information information NN 10_1101-436634 663 20 about about IN 10_1101-436634 663 21 novel novel JJ 10_1101-436634 663 22 junctions junction NNS 10_1101-436634 663 23 and and CC 10_1101-436634 663 24 associated associate VBD 10_1101-436634 663 25 potential potential JJ 10_1101-436634 663 26 cis cis NN 10_1101-436634 663 27 splice splice NN 10_1101-436634 663 28 - - HYPH 10_1101-436634 663 29 altering alter VBG 10_1101-436634 663 30 sequence sequence NN 10_1101-436634 663 31 variants variant NNS 10_1101-436634 663 32 . . . 10_1101-436634 664 1 RegTools RegTools NNP 10_1101-436634 664 2 is be VBZ 10_1101-436634 664 3 agnostic agnostic JJ 10_1101-436634 664 4 to to TO 10_1101-436634 664 5 downstream downstream JJ 10_1101-436634 664 6 research research NN 10_1101-436634 664 7 goals goal NNS 10_1101-436634 664 8 and and CC 10_1101-436634 664 9 its -PRON- PRP$ 10_1101-436634 664 10 output output NN 10_1101-436634 664 11 can can MD 10_1101-436634 664 12 be be VB 10_1101-436634 664 13 filtered filter VBN 10_1101-436634 664 14 through through IN 10_1101-436634 664 15 user user NN 10_1101-436634 664 16 - - HYPH 10_1101-436634 664 17 specific specific JJ 10_1101-436634 664 18 methods method NNS 10_1101-436634 664 19 and and CC 10_1101-436634 664 20 thus thus RB 10_1101-436634 664 21 can can MD 10_1101-436634 664 22 be be VB 10_1101-436634 664 23 applied apply VBN 10_1101-436634 664 24 to to IN 10_1101-436634 664 25 a a DT 10_1101-436634 664 26 broad broad JJ 10_1101-436634 664 27 set set NN 10_1101-436634 664 28 of of IN 10_1101-436634 664 29 scientific scientific JJ 10_1101-436634 664 30 questions question NNS 10_1101-436634 664 31 . . . 10_1101-436634 665 1 .CC .CC NFP 10_1101-436634 665 2 - - : 10_1101-436634 665 3 BY by IN 10_1101-436634 665 4 - - HYPH 10_1101-436634 665 5 NC NC NNP 10_1101-436634 665 6 - - HYPH 10_1101-436634 665 7 ND ND NNP 10_1101-436634 665 8 4.0 4.0 CD 10_1101-436634 665 9 International international JJ 10_1101-436634 665 10 licensea licensea NNS 10_1101-436634 665 11 certified certify VBN 10_1101-436634 665 12 by by IN 10_1101-436634 665 13 peer peer NN 10_1101-436634 665 14 review review NN 10_1101-436634 665 15 ) ) -RRB- 10_1101-436634 665 16 is be VBZ 10_1101-436634 665 17 the the DT 10_1101-436634 665 18 author author NN 10_1101-436634 665 19 / / SYM 10_1101-436634 665 20 funder funder NN 10_1101-436634 665 21 , , , 10_1101-436634 665 22 who who WP 10_1101-436634 665 23 has have VBZ 10_1101-436634 665 24 granted grant VBN 10_1101-436634 665 25 bioRxiv biorxiv IN 10_1101-436634 665 26 a a DT 10_1101-436634 665 27 license license NN 10_1101-436634 665 28 to to TO 10_1101-436634 665 29 display display VB 10_1101-436634 665 30 the the DT 10_1101-436634 665 31 preprint preprint NN 10_1101-436634 665 32 in in IN 10_1101-436634 665 33 perpetuity perpetuity NN 10_1101-436634 665 34 . . . 10_1101-436634 666 1 It -PRON- PRP 10_1101-436634 666 2 is be VBZ 10_1101-436634 666 3 made make VBN 10_1101-436634 666 4 available available JJ 10_1101-436634 666 5 under under IN 10_1101-436634 666 6 The the DT 10_1101-436634 666 7 copyright copyright NN 10_1101-436634 666 8 holder holder NN 10_1101-436634 666 9 for for IN 10_1101-436634 666 10 this this DT 10_1101-436634 666 11 preprint preprint NN 10_1101-436634 666 12 ( ( -LRB- 10_1101-436634 666 13 which which WDT 10_1101-436634 666 14 was be VBD 10_1101-436634 666 15 notthis notthis DT 10_1101-436634 666 16 version version NN 10_1101-436634 666 17 posted post VBN 10_1101-436634 666 18 January January NNP 10_1101-436634 666 19 5 5 CD 10_1101-436634 666 20 , , , 10_1101-436634 666 21 2021 2021 CD 10_1101-436634 666 22 . . . 10_1101-436634 666 23 ; ; : 10_1101-436634 666 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 666 25 : : : 10_1101-436634 666 26 bioRxiv biorxiv VB 10_1101-436634 666 27 preprint preprint NN 10_1101-436634 666 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 666 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 666 30 25 25 CD 10_1101-436634 666 31 .CC .CC : 10_1101-436634 666 32 - - HYPH 10_1101-436634 666 33 BY by IN 10_1101-436634 666 34 - - HYPH 10_1101-436634 666 35 NC NC NNP 10_1101-436634 666 36 - - HYPH 10_1101-436634 666 37 ND ND NNP 10_1101-436634 666 38 4.0 4.0 CD 10_1101-436634 666 39 International international JJ 10_1101-436634 666 40 licensea licensea NNS 10_1101-436634 666 41 certified certify VBN 10_1101-436634 666 42 by by IN 10_1101-436634 666 43 peer peer NN 10_1101-436634 666 44 review review NN 10_1101-436634 666 45 ) ) -RRB- 10_1101-436634 666 46 is be VBZ 10_1101-436634 666 47 the the DT 10_1101-436634 666 48 author author NN 10_1101-436634 666 49 / / SYM 10_1101-436634 666 50 funder funder NN 10_1101-436634 666 51 , , , 10_1101-436634 666 52 who who WP 10_1101-436634 666 53 has have VBZ 10_1101-436634 666 54 granted grant VBN 10_1101-436634 666 55 bioRxiv biorxiv IN 10_1101-436634 666 56 a a DT 10_1101-436634 666 57 license license NN 10_1101-436634 666 58 to to TO 10_1101-436634 666 59 display display VB 10_1101-436634 666 60 the the DT 10_1101-436634 666 61 preprint preprint NN 10_1101-436634 666 62 in in IN 10_1101-436634 666 63 perpetuity perpetuity NN 10_1101-436634 666 64 . . . 10_1101-436634 667 1 It -PRON- PRP 10_1101-436634 667 2 is be VBZ 10_1101-436634 667 3 made make VBN 10_1101-436634 667 4 available available JJ 10_1101-436634 667 5 under under IN 10_1101-436634 667 6 The the DT 10_1101-436634 667 7 copyright copyright NN 10_1101-436634 667 8 holder holder NN 10_1101-436634 667 9 for for IN 10_1101-436634 667 10 this this DT 10_1101-436634 667 11 preprint preprint NN 10_1101-436634 667 12 ( ( -LRB- 10_1101-436634 667 13 which which WDT 10_1101-436634 667 14 was be VBD 10_1101-436634 667 15 notthis notthis DT 10_1101-436634 667 16 version version NN 10_1101-436634 667 17 posted post VBN 10_1101-436634 667 18 January January NNP 10_1101-436634 667 19 5 5 CD 10_1101-436634 667 20 , , , 10_1101-436634 667 21 2021 2021 CD 10_1101-436634 667 22 . . . 10_1101-436634 667 23 ; ; : 10_1101-436634 667 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 667 25 : : : 10_1101-436634 667 26 bioRxiv biorxiv VB 10_1101-436634 667 27 preprint preprint NN 10_1101-436634 667 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 667 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 667 30 26 26 CD 10_1101-436634 667 31 Figure Figure NNP 10_1101-436634 667 32 2 2 CD 10_1101-436634 667 33 . . . 10_1101-436634 668 1 Overview overview NN 10_1101-436634 668 2 of of IN 10_1101-436634 668 3 input input NN 10_1101-436634 668 4 data datum NNS 10_1101-436634 668 5 considered consider VBN 10_1101-436634 668 6 and and CC 10_1101-436634 668 7 significant significant JJ 10_1101-436634 668 8 events event NNS 10_1101-436634 668 9 identified identify VBN 10_1101-436634 668 10 by by IN 10_1101-436634 668 11 RegTools regtool NNS 10_1101-436634 668 12 for for IN 10_1101-436634 668 13 each each DT 10_1101-436634 668 14 tumor tumor NN 10_1101-436634 668 15 type type NN 10_1101-436634 668 16 . . . 10_1101-436634 669 1 A a LS 10_1101-436634 669 2 ) ) -RRB- 10_1101-436634 669 3 Summary summary NN 10_1101-436634 669 4 of of IN 10_1101-436634 669 5 initial initial JJ 10_1101-436634 669 6 variants variant NNS 10_1101-436634 669 7 considered consider VBN 10_1101-436634 669 8 for for IN 10_1101-436634 669 9 analysis analysis NN 10_1101-436634 669 10 by by IN 10_1101-436634 669 11 RegTools regtool NNS 10_1101-436634 669 12 per per IN 10_1101-436634 669 13 sample sample NN 10_1101-436634 669 14 per per IN 10_1101-436634 669 15 tumor tumor NN 10_1101-436634 669 16 cohort cohort NN 10_1101-436634 669 17 . . . 10_1101-436634 670 1 Each each DT 10_1101-436634 670 2 sample sample NN 10_1101-436634 670 3 ’s ’s POS 10_1101-436634 670 4 variant variant JJ 10_1101-436634 670 5 count count NN 10_1101-436634 670 6 is be VBZ 10_1101-436634 670 7 plotted plot VBN 10_1101-436634 670 8 and and CC 10_1101-436634 670 9 violin violin NN 10_1101-436634 670 10 plots plot NNS 10_1101-436634 670 11 are be VBP 10_1101-436634 670 12 overlaid overlay VBN 10_1101-436634 670 13 for for IN 10_1101-436634 670 14 each each DT 10_1101-436634 670 15 cohort cohort NN 10_1101-436634 670 16 . . . 10_1101-436634 671 1 B B NNP 10_1101-436634 671 2 ) ) -RRB- 10_1101-436634 671 3 Summary Summary NNP 10_1101-436634 671 4 unique unique JJ 10_1101-436634 671 5 exon exon NN 10_1101-436634 671 6 - - HYPH 10_1101-436634 671 7 exon exon NN 10_1101-436634 671 8 junction junction NN 10_1101-436634 671 9 observations observation NNS 10_1101-436634 671 10 for for IN 10_1101-436634 671 11 each each DT 10_1101-436634 671 12 sample sample NN 10_1101-436634 671 13 . . . 10_1101-436634 672 1 Each each DT 10_1101-436634 672 2 sample sample NN 10_1101-436634 672 3 ’s ’s POS 10_1101-436634 672 4 unique unique JJ 10_1101-436634 672 5 junction junction NN 10_1101-436634 672 6 count count NN 10_1101-436634 672 7 is be VBZ 10_1101-436634 672 8 plotted plot VBN 10_1101-436634 672 9 and and CC 10_1101-436634 672 10 violin violin NN 10_1101-436634 672 11 plots plot NNS 10_1101-436634 672 12 are be VBP 10_1101-436634 672 13 overlaid overlay VBN 10_1101-436634 672 14 for for IN 10_1101-436634 672 15 each each DT 10_1101-436634 672 16 cohort cohort NN 10_1101-436634 672 17 . . . 10_1101-436634 673 1 C C NNP 10_1101-436634 673 2 ) ) -RRB- 10_1101-436634 673 3 Summary Summary NNP 10_1101-436634 673 4 of of IN 10_1101-436634 673 5 significant significant JJ 10_1101-436634 673 6 junction junction NN 10_1101-436634 673 7 types type NNS 10_1101-436634 673 8 for for IN 10_1101-436634 673 9 each each DT 10_1101-436634 673 10 cohort cohort NN 10_1101-436634 673 11 across across IN 10_1101-436634 673 12 each each DT 10_1101-436634 673 13 of of IN 10_1101-436634 673 14 the the DT 10_1101-436634 673 15 variant variant JJ 10_1101-436634 673 16 window window NN 10_1101-436634 673 17 sizes size NNS 10_1101-436634 673 18 that that WDT 10_1101-436634 673 19 were be VBD 10_1101-436634 673 20 used use VBN 10_1101-436634 673 21 in in IN 10_1101-436634 673 22 this this DT 10_1101-436634 673 23 analysis analysis NN 10_1101-436634 673 24 . . . 10_1101-436634 674 1 .CC .CC NFP 10_1101-436634 674 2 - - : 10_1101-436634 674 3 BY by IN 10_1101-436634 674 4 - - HYPH 10_1101-436634 674 5 NC NC NNP 10_1101-436634 674 6 - - HYPH 10_1101-436634 674 7 ND ND NNP 10_1101-436634 674 8 4.0 4.0 CD 10_1101-436634 674 9 International international JJ 10_1101-436634 674 10 licensea licensea NNS 10_1101-436634 674 11 certified certify VBN 10_1101-436634 674 12 by by IN 10_1101-436634 674 13 peer peer NN 10_1101-436634 674 14 review review NN 10_1101-436634 674 15 ) ) -RRB- 10_1101-436634 674 16 is be VBZ 10_1101-436634 674 17 the the DT 10_1101-436634 674 18 author author NN 10_1101-436634 674 19 / / SYM 10_1101-436634 674 20 funder funder NN 10_1101-436634 674 21 , , , 10_1101-436634 674 22 who who WP 10_1101-436634 674 23 has have VBZ 10_1101-436634 674 24 granted grant VBN 10_1101-436634 674 25 bioRxiv biorxiv IN 10_1101-436634 674 26 a a DT 10_1101-436634 674 27 license license NN 10_1101-436634 674 28 to to TO 10_1101-436634 674 29 display display VB 10_1101-436634 674 30 the the DT 10_1101-436634 674 31 preprint preprint NN 10_1101-436634 674 32 in in IN 10_1101-436634 674 33 perpetuity perpetuity NN 10_1101-436634 674 34 . . . 10_1101-436634 675 1 It -PRON- PRP 10_1101-436634 675 2 is be VBZ 10_1101-436634 675 3 made make VBN 10_1101-436634 675 4 available available JJ 10_1101-436634 675 5 under under IN 10_1101-436634 675 6 The the DT 10_1101-436634 675 7 copyright copyright NN 10_1101-436634 675 8 holder holder NN 10_1101-436634 675 9 for for IN 10_1101-436634 675 10 this this DT 10_1101-436634 675 11 preprint preprint NN 10_1101-436634 675 12 ( ( -LRB- 10_1101-436634 675 13 which which WDT 10_1101-436634 675 14 was be VBD 10_1101-436634 675 15 notthis notthis DT 10_1101-436634 675 16 version version NN 10_1101-436634 675 17 posted post VBN 10_1101-436634 675 18 January January NNP 10_1101-436634 675 19 5 5 CD 10_1101-436634 675 20 , , , 10_1101-436634 675 21 2021 2021 CD 10_1101-436634 675 22 . . . 10_1101-436634 675 23 ; ; : 10_1101-436634 675 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 675 25 : : : 10_1101-436634 675 26 bioRxiv biorxiv VB 10_1101-436634 675 27 preprint preprint NN 10_1101-436634 675 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 675 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CC 10_1101-436634 675 30 27 27 CD 10_1101-436634 675 31 Figure figure NN 10_1101-436634 675 32 3 3 CD 10_1101-436634 675 33 . . . 10_1101-436634 676 1 Splice splice NN 10_1101-436634 676 2 regulatory regulatory JJ 10_1101-436634 676 3 variants variant NNS 10_1101-436634 676 4 often often RB 10_1101-436634 676 5 lead lead VBP 10_1101-436634 676 6 to to IN 10_1101-436634 676 7 the the DT 10_1101-436634 676 8 expression expression NN 10_1101-436634 676 9 of of IN 10_1101-436634 676 10 multiple multiple JJ 10_1101-436634 676 11 alternative alternative JJ 10_1101-436634 676 12 junctions junction NNS 10_1101-436634 676 13 . . . 10_1101-436634 677 1 A a LS 10_1101-436634 677 2 ) ) -RRB- 10_1101-436634 677 3 A a DT 10_1101-436634 677 4 single single JJ 10_1101-436634 677 5 variant variant NN 10_1101-436634 677 6 can can MD 10_1101-436634 677 7 result result VB 10_1101-436634 677 8 in in IN 10_1101-436634 677 9 either either DT 10_1101-436634 677 10 one one CD 10_1101-436634 677 11 or or CC 10_1101-436634 677 12 more more JJR 10_1101-436634 677 13 than than IN 10_1101-436634 677 14 one one CD 10_1101-436634 677 15 alternatively alternatively RB 10_1101-436634 677 16 spliced spliced JJ 10_1101-436634 677 17 junctions junction NNS 10_1101-436634 677 18 . . . 10_1101-436634 678 1 Depicted Depicted NNP 10_1101-436634 678 2 is be VBZ 10_1101-436634 678 3 a a DT 10_1101-436634 678 4 variant variant JJ 10_1101-436634 678 5 resulting resulting NN 10_1101-436634 678 6 in in IN 10_1101-436634 678 7 a a DT 10_1101-436634 678 8 single single JJ 10_1101-436634 678 9 novel novel NN 10_1101-436634 678 10 transcript transcript NN 10_1101-436634 678 11 product product NN 10_1101-436634 678 12 ( ( -LRB- 10_1101-436634 678 13 purple purple NNP 10_1101-436634 678 14 ) ) -RRB- 10_1101-436634 678 15 , , , 10_1101-436634 678 16 a a DT 10_1101-436634 678 17 variant variant JJ 10_1101-436634 678 18 resulting result VBG 10_1101-436634 678 19 in in IN 10_1101-436634 678 20 two two CD 10_1101-436634 678 21 novel novel JJ 10_1101-436634 678 22 transcript transcript NN 10_1101-436634 678 23 products product NNS 10_1101-436634 678 24 that that IN 10_1101-436634 678 25 both both DT 10_1101-436634 678 26 use use VBP 10_1101-436634 678 27 alternate alternate JJ 10_1101-436634 678 28 donor donor NN 10_1101-436634 678 29 sites site NNS 10_1101-436634 678 30 ( ( -LRB- 10_1101-436634 678 31 yellow yellow NNP 10_1101-436634 678 32 ) ) -RRB- 10_1101-436634 678 33 , , , 10_1101-436634 678 34 and and CC 10_1101-436634 678 35 a a DT 10_1101-436634 678 36 variant variant JJ 10_1101-436634 678 37 resulting result VBG 10_1101-436634 678 38 in in IN 10_1101-436634 678 39 multiple multiple JJ 10_1101-436634 678 40 junctions junction NNS 10_1101-436634 678 41 of of IN 10_1101-436634 678 42 different different JJ 10_1101-436634 678 43 types type NNS 10_1101-436634 678 44 ( ( -LRB- 10_1101-436634 678 45 teal teal NN 10_1101-436634 678 46 ) ) -RRB- 10_1101-436634 678 47 . . . 10_1101-436634 679 1 B b NN 10_1101-436634 679 2 ) ) -RRB- 10_1101-436634 679 3 Stacked stack VBD 10_1101-436634 679 4 bar bar NN 10_1101-436634 679 5 graph graph NN 10_1101-436634 679 6 visualizing visualizing NN 10_1101-436634 679 7 how how WRB 10_1101-436634 679 8 often often RB 10_1101-436634 679 9 a a DT 10_1101-436634 679 10 .CC .CC : 10_1101-436634 679 11 - - HYPH 10_1101-436634 679 12 BY by IN 10_1101-436634 679 13 - - HYPH 10_1101-436634 679 14 NC NC NNP 10_1101-436634 679 15 - - HYPH 10_1101-436634 679 16 ND ND NNP 10_1101-436634 679 17 4.0 4.0 CD 10_1101-436634 679 18 International international JJ 10_1101-436634 679 19 licensea licensea NNS 10_1101-436634 679 20 certified certify VBN 10_1101-436634 679 21 by by IN 10_1101-436634 679 22 peer peer NN 10_1101-436634 679 23 review review NN 10_1101-436634 679 24 ) ) -RRB- 10_1101-436634 679 25 is be VBZ 10_1101-436634 679 26 the the DT 10_1101-436634 679 27 author author NN 10_1101-436634 679 28 / / SYM 10_1101-436634 679 29 funder funder NN 10_1101-436634 679 30 , , , 10_1101-436634 679 31 who who WP 10_1101-436634 679 32 has have VBZ 10_1101-436634 679 33 granted grant VBN 10_1101-436634 679 34 bioRxiv biorxiv IN 10_1101-436634 679 35 a a DT 10_1101-436634 679 36 license license NN 10_1101-436634 679 37 to to TO 10_1101-436634 679 38 display display VB 10_1101-436634 679 39 the the DT 10_1101-436634 679 40 preprint preprint NN 10_1101-436634 679 41 in in IN 10_1101-436634 679 42 perpetuity perpetuity NN 10_1101-436634 679 43 . . . 10_1101-436634 680 1 It -PRON- PRP 10_1101-436634 680 2 is be VBZ 10_1101-436634 680 3 made make VBN 10_1101-436634 680 4 available available JJ 10_1101-436634 680 5 under under IN 10_1101-436634 680 6 The the DT 10_1101-436634 680 7 copyright copyright NN 10_1101-436634 680 8 holder holder NN 10_1101-436634 680 9 for for IN 10_1101-436634 680 10 this this DT 10_1101-436634 680 11 preprint preprint NN 10_1101-436634 680 12 ( ( -LRB- 10_1101-436634 680 13 which which WDT 10_1101-436634 680 14 was be VBD 10_1101-436634 680 15 notthis notthis DT 10_1101-436634 680 16 version version NN 10_1101-436634 680 17 posted post VBN 10_1101-436634 680 18 January January NNP 10_1101-436634 680 19 5 5 CD 10_1101-436634 680 20 , , , 10_1101-436634 680 21 2021 2021 CD 10_1101-436634 680 22 . . . 10_1101-436634 680 23 ; ; : 10_1101-436634 680 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 680 25 : : : 10_1101-436634 680 26 bioRxiv biorxiv VB 10_1101-436634 680 27 preprint preprint NN 10_1101-436634 680 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 680 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 680 30 28 28 CD 10_1101-436634 680 31 variant variant JJ 10_1101-436634 680 32 leads lead NNS 10_1101-436634 680 33 to to IN 10_1101-436634 680 34 each each DT 10_1101-436634 680 35 of of IN 10_1101-436634 680 36 the the DT 10_1101-436634 680 37 categories category NNS 10_1101-436634 680 38 mentioned mention VBN 10_1101-436634 680 39 above above RB 10_1101-436634 680 40 across across IN 10_1101-436634 680 41 the the DT 10_1101-436634 680 42 four four CD 10_1101-436634 680 43 RegTools RegTools NNP 10_1101-436634 680 44 variant variant JJ 10_1101-436634 680 45 windows window NNS 10_1101-436634 680 46 used use VBN 10_1101-436634 680 47 . . . 10_1101-436634 681 1 This this DT 10_1101-436634 681 2 analysis analysis NN 10_1101-436634 681 3 is be VBZ 10_1101-436634 681 4 for for IN 10_1101-436634 681 5 all all DT 10_1101-436634 681 6 variants variant NNS 10_1101-436634 681 7 that that WDT 10_1101-436634 681 8 RegTools RegTools NNP 10_1101-436634 681 9 identified identify VBN 10_1101-436634 681 10 as as IN 10_1101-436634 681 11 significant significant JJ 10_1101-436634 681 12 . . . 10_1101-436634 682 1 C C NNP 10_1101-436634 682 2 ) ) -RRB- 10_1101-436634 682 3 Bar bar VB 10_1101-436634 682 4 chart chart NN 10_1101-436634 682 5 showing show VBG 10_1101-436634 682 6 how how WRB 10_1101-436634 682 7 often often RB 10_1101-436634 682 8 each each DT 10_1101-436634 682 9 of of IN 10_1101-436634 682 10 the the DT 10_1101-436634 682 11 described describe VBN 10_1101-436634 682 12 junction junction NN 10_1101-436634 682 13 combinations combination NNS 10_1101-436634 682 14 occurs occur VBZ 10_1101-436634 682 15 when when WRB 10_1101-436634 682 16 a a DT 10_1101-436634 682 17 single single JJ 10_1101-436634 682 18 variant variant JJ 10_1101-436634 682 19 results result NNS 10_1101-436634 682 20 in in IN 10_1101-436634 682 21 multiple multiple JJ 10_1101-436634 682 22 junction junction NN 10_1101-436634 682 23 types type NNS 10_1101-436634 682 24 across across IN 10_1101-436634 682 25 each each DT 10_1101-436634 682 26 of of IN 10_1101-436634 682 27 the the DT 10_1101-436634 682 28 RegTools RegTools NNP 10_1101-436634 682 29 splice splice NN 10_1101-436634 682 30 variant variant JJ 10_1101-436634 682 31 windows window NNS 10_1101-436634 682 32 used use VBD 10_1101-436634 682 33 . . . 10_1101-436634 683 1 .CC .CC NFP 10_1101-436634 683 2 - - : 10_1101-436634 683 3 BY by IN 10_1101-436634 683 4 - - HYPH 10_1101-436634 683 5 NC NC NNP 10_1101-436634 683 6 - - HYPH 10_1101-436634 683 7 ND ND NNP 10_1101-436634 683 8 4.0 4.0 CD 10_1101-436634 683 9 International international JJ 10_1101-436634 683 10 licensea licensea NNS 10_1101-436634 683 11 certified certify VBN 10_1101-436634 683 12 by by IN 10_1101-436634 683 13 peer peer NN 10_1101-436634 683 14 review review NN 10_1101-436634 683 15 ) ) -RRB- 10_1101-436634 683 16 is be VBZ 10_1101-436634 683 17 the the DT 10_1101-436634 683 18 author author NN 10_1101-436634 683 19 / / SYM 10_1101-436634 683 20 funder funder NN 10_1101-436634 683 21 , , , 10_1101-436634 683 22 who who WP 10_1101-436634 683 23 has have VBZ 10_1101-436634 683 24 granted grant VBN 10_1101-436634 683 25 bioRxiv biorxiv IN 10_1101-436634 683 26 a a DT 10_1101-436634 683 27 license license NN 10_1101-436634 683 28 to to TO 10_1101-436634 683 29 display display VB 10_1101-436634 683 30 the the DT 10_1101-436634 683 31 preprint preprint NN 10_1101-436634 683 32 in in IN 10_1101-436634 683 33 perpetuity perpetuity NN 10_1101-436634 683 34 . . . 10_1101-436634 684 1 It -PRON- PRP 10_1101-436634 684 2 is be VBZ 10_1101-436634 684 3 made make VBN 10_1101-436634 684 4 available available JJ 10_1101-436634 684 5 under under IN 10_1101-436634 684 6 The the DT 10_1101-436634 684 7 copyright copyright NN 10_1101-436634 684 8 holder holder NN 10_1101-436634 684 9 for for IN 10_1101-436634 684 10 this this DT 10_1101-436634 684 11 preprint preprint NN 10_1101-436634 684 12 ( ( -LRB- 10_1101-436634 684 13 which which WDT 10_1101-436634 684 14 was be VBD 10_1101-436634 684 15 notthis notthis DT 10_1101-436634 684 16 version version NN 10_1101-436634 684 17 posted post VBN 10_1101-436634 684 18 January January NNP 10_1101-436634 684 19 5 5 CD 10_1101-436634 684 20 , , , 10_1101-436634 684 21 2021 2021 CD 10_1101-436634 684 22 . . . 10_1101-436634 684 23 ; ; : 10_1101-436634 684 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 684 25 : : : 10_1101-436634 684 26 bioRxiv biorxiv VB 10_1101-436634 684 27 preprint preprint NN 10_1101-436634 684 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 684 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 684 30 29 29 CD 10_1101-436634 684 31 Figure figure NN 10_1101-436634 684 32 4 4 CD 10_1101-436634 684 33 . . . 10_1101-436634 685 1 Comparison Comparison NNP 10_1101-436634 685 2 of of IN 10_1101-436634 685 3 RegTools RegTools NNP 10_1101-436634 685 4 with with IN 10_1101-436634 685 5 other other JJ 10_1101-436634 685 6 tools tool NNS 10_1101-436634 685 7 that that WDT 10_1101-436634 685 8 identify identify VBP 10_1101-436634 685 9 potential potential JJ 10_1101-436634 685 10 splice splice NN 10_1101-436634 685 11 altering alter VBG 10_1101-436634 685 12 variants variant NNS 10_1101-436634 685 13 . . . 10_1101-436634 686 1 A a LS 10_1101-436634 686 2 ) ) -RRB- 10_1101-436634 686 3 Conceptual conceptual JJ 10_1101-436634 686 4 diagram diagram NN 10_1101-436634 686 5 of of IN 10_1101-436634 686 6 contrasting contrast VBG 10_1101-436634 686 7 approaches approach NNS 10_1101-436634 686 8 used use VBN 10_1101-436634 686 9 to to TO 10_1101-436634 686 10 identify identify VB 10_1101-436634 686 11 splice splice NN 10_1101-436634 686 12 regulatory regulatory JJ 10_1101-436634 686 13 tools tool NNS 10_1101-436634 686 14 / / SYM 10_1101-436634 686 15 methods method NNS 10_1101-436634 686 16 . . . 10_1101-436634 687 1 A a DT 10_1101-436634 687 2 red red JJ 10_1101-436634 687 3 dot dot NN 10_1101-436634 687 4 indicates indicate VBZ 10_1101-436634 687 5 that that IN 10_1101-436634 687 6 the the DT 10_1101-436634 687 7 source source NN 10_1101-436634 687 8 only only RB 10_1101-436634 687 9 considers consider VBZ 10_1101-436634 687 10 genomic genomic JJ 10_1101-436634 687 11 data datum NNS 10_1101-436634 687 12 for for IN 10_1101-436634 687 13 making make VBG 10_1101-436634 687 14 its -PRON- PRP$ 10_1101-436634 687 15 calls call NNS 10_1101-436634 687 16 , , , 10_1101-436634 687 17 as as IN 10_1101-436634 687 18 opposed oppose VBN 10_1101-436634 687 19 to to IN 10_1101-436634 687 20 a a DT 10_1101-436634 687 21 combination combination NN 10_1101-436634 687 22 of of IN 10_1101-436634 687 23 genomic genomic JJ 10_1101-436634 687 24 and and CC 10_1101-436634 687 25 transcriptomic transcriptomic JJ 10_1101-436634 687 26 data datum NNS 10_1101-436634 687 27 . . . 10_1101-436634 688 1 B b NN 10_1101-436634 688 2 ) ) -RRB- 10_1101-436634 688 3 UpSet UpSet NNP 10_1101-436634 688 4 plot plot NN 10_1101-436634 688 5 comparing compare VBG 10_1101-436634 688 6 splice splice NN 10_1101-436634 688 7 altering alter VBG 10_1101-436634 688 8 variants variant NNS 10_1101-436634 688 9 identified identify VBN 10_1101-436634 688 10 by by IN 10_1101-436634 688 11 RegTools regtool NNS 10_1101-436634 688 12 to to IN 10_1101-436634 688 13 those those DT 10_1101-436634 688 14 identified identify VBN 10_1101-436634 688 15 by by IN 10_1101-436634 688 16 other other JJ 10_1101-436634 688 17 splice splice NN 10_1101-436634 688 18 variant variant JJ 10_1101-436634 688 19 predictors predictor NNS 10_1101-436634 688 20 and and CC 10_1101-436634 688 21 annotators annotator NNS 10_1101-436634 688 22 . . . 10_1101-436634 689 1 Each each DT 10_1101-436634 689 2 tool tool NN 10_1101-436634 689 3 and and CC 10_1101-436634 689 4 their -PRON- PRP$ 10_1101-436634 689 5 total total JJ 10_1101-436634 689 6 number number NN 10_1101-436634 689 7 of of IN 10_1101-436634 689 8 variant variant JJ 10_1101-436634 689 9 predictions prediction NNS 10_1101-436634 689 10 are be VBP 10_1101-436634 689 11 shown show VBN 10_1101-436634 689 12 on on IN 10_1101-436634 689 13 the the DT 10_1101-436634 689 14 left left JJ 10_1101-436634 689 15 side side NN 10_1101-436634 689 16 bar bar NN 10_1101-436634 689 17 graph graph NN 10_1101-436634 689 18 . . . 10_1101-436634 690 1 The the DT 10_1101-436634 690 2 numbers number NNS 10_1101-436634 690 3 of of IN 10_1101-436634 690 4 variants variant NNS 10_1101-436634 690 5 specific specific JJ 10_1101-436634 690 6 to to IN 10_1101-436634 690 7 each each DT 10_1101-436634 690 8 tool tool NN 10_1101-436634 690 9 or or CC 10_1101-436634 690 10 shared share VBN 10_1101-436634 690 11 between between IN 10_1101-436634 690 12 different different JJ 10_1101-436634 690 13 combinations combination NNS 10_1101-436634 690 14 of of IN 10_1101-436634 690 15 tools tool NNS 10_1101-436634 690 16 are be VBP 10_1101-436634 690 17 indicated indicate VBN 10_1101-436634 690 18 by by IN 10_1101-436634 690 19 the the DT 10_1101-436634 690 20 bar bar NN 10_1101-436634 690 21 graph graph NN 10_1101-436634 690 22 along along IN 10_1101-436634 690 23 the the DT 10_1101-436634 690 24 top top NN 10_1101-436634 690 25 and and CC 10_1101-436634 690 26 the the DT 10_1101-436634 690 27 individual individual JJ 10_1101-436634 690 28 or or CC 10_1101-436634 690 29 connected connected JJ 10_1101-436634 690 30 dots dot NNS 10_1101-436634 690 31 . . . 10_1101-436634 691 1 .CC .CC NFP 10_1101-436634 691 2 - - : 10_1101-436634 691 3 BY by IN 10_1101-436634 691 4 - - HYPH 10_1101-436634 691 5 NC NC NNP 10_1101-436634 691 6 - - HYPH 10_1101-436634 691 7 ND ND NNP 10_1101-436634 691 8 4.0 4.0 CD 10_1101-436634 691 9 International international JJ 10_1101-436634 691 10 licensea licensea NNS 10_1101-436634 691 11 certified certify VBN 10_1101-436634 691 12 by by IN 10_1101-436634 691 13 peer peer NN 10_1101-436634 691 14 review review NN 10_1101-436634 691 15 ) ) -RRB- 10_1101-436634 691 16 is be VBZ 10_1101-436634 691 17 the the DT 10_1101-436634 691 18 author author NN 10_1101-436634 691 19 / / SYM 10_1101-436634 691 20 funder funder NN 10_1101-436634 691 21 , , , 10_1101-436634 691 22 who who WP 10_1101-436634 691 23 has have VBZ 10_1101-436634 691 24 granted grant VBN 10_1101-436634 691 25 bioRxiv biorxiv IN 10_1101-436634 691 26 a a DT 10_1101-436634 691 27 license license NN 10_1101-436634 691 28 to to TO 10_1101-436634 691 29 display display VB 10_1101-436634 691 30 the the DT 10_1101-436634 691 31 preprint preprint NN 10_1101-436634 691 32 in in IN 10_1101-436634 691 33 perpetuity perpetuity NN 10_1101-436634 691 34 . . . 10_1101-436634 692 1 It -PRON- PRP 10_1101-436634 692 2 is be VBZ 10_1101-436634 692 3 made make VBN 10_1101-436634 692 4 available available JJ 10_1101-436634 692 5 under under IN 10_1101-436634 692 6 The the DT 10_1101-436634 692 7 copyright copyright NN 10_1101-436634 692 8 holder holder NN 10_1101-436634 692 9 for for IN 10_1101-436634 692 10 this this DT 10_1101-436634 692 11 preprint preprint NN 10_1101-436634 692 12 ( ( -LRB- 10_1101-436634 692 13 which which WDT 10_1101-436634 692 14 was be VBD 10_1101-436634 692 15 notthis notthis DT 10_1101-436634 692 16 version version NN 10_1101-436634 692 17 posted post VBN 10_1101-436634 692 18 January January NNP 10_1101-436634 692 19 5 5 CD 10_1101-436634 692 20 , , , 10_1101-436634 692 21 2021 2021 CD 10_1101-436634 692 22 . . . 10_1101-436634 692 23 ; ; : 10_1101-436634 692 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 692 25 : : : 10_1101-436634 692 26 bioRxiv biorxiv VB 10_1101-436634 692 27 preprint preprint NN 10_1101-436634 692 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 692 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 692 30 30 30 CD 10_1101-436634 692 31 Figure figure NN 10_1101-436634 692 32 5 5 CD 10_1101-436634 692 33 . . . 10_1101-436634 693 1 Pan pan JJ 10_1101-436634 693 2 - - JJ 10_1101-436634 693 3 cancer cancer JJ 10_1101-436634 693 4 analysis analysis NN 10_1101-436634 693 5 of of IN 10_1101-436634 693 6 cohorts cohort NNS 10_1101-436634 693 7 from from IN 10_1101-436634 693 8 TCGA TCGA NNP 10_1101-436634 693 9 and and CC 10_1101-436634 693 10 MGI MGI NNP 10_1101-436634 693 11 reveals reveal VBZ 10_1101-436634 693 12 genes gene NNS 10_1101-436634 693 13 recurrently recurrently RB 10_1101-436634 693 14 disrupted disrupt VBN 10_1101-436634 693 15 by by IN 10_1101-436634 693 16 variants variant NNS 10_1101-436634 693 17 which which WDT 10_1101-436634 693 18 cause cause VBP 10_1101-436634 693 19 non non JJ 10_1101-436634 693 20 - - JJ 10_1101-436634 693 21 canonical canonical JJ 10_1101-436634 693 22 splicing splicing NN 10_1101-436634 693 23 patterns pattern NNS 10_1101-436634 693 24 Results Results NNPS 10_1101-436634 693 25 of of IN 10_1101-436634 693 26 analysis analysis NN 10_1101-436634 693 27 for for IN 10_1101-436634 693 28 recurrently recurrently RB 10_1101-436634 693 29 disrupted disrupt VBN 10_1101-436634 693 30 genes gene NNS 10_1101-436634 693 31 in in IN 10_1101-436634 693 32 each each DT 10_1101-436634 693 33 cohort cohort NN 10_1101-436634 693 34 . . . 10_1101-436634 694 1 Columns column NNS 10_1101-436634 694 2 correspond correspond VBP 10_1101-436634 694 3 to to IN 10_1101-436634 694 4 the the DT 10_1101-436634 694 5 20 20 CD 10_1101-436634 694 6 most most RBS 10_1101-436634 694 7 frequently frequently RB 10_1101-436634 694 8 recurring recur VBG 10_1101-436634 694 9 genes gene NNS 10_1101-436634 694 10 , , , 10_1101-436634 694 11 as as IN 10_1101-436634 694 12 ranked rank VBN 10_1101-436634 694 13 by by IN 10_1101-436634 694 14 fraction fraction NN 10_1101-436634 694 15 of of IN 10_1101-436634 694 16 samples sample NNS 10_1101-436634 694 17 . . . 10_1101-436634 695 1 Genes gene NNS 10_1101-436634 695 2 are be VBP 10_1101-436634 695 3 clustered cluster VBN 10_1101-436634 695 4 by by IN 10_1101-436634 695 5 whether whether IN 10_1101-436634 695 6 they -PRON- PRP 10_1101-436634 695 7 were be VBD 10_1101-436634 695 8 annotated annotate VBN 10_1101-436634 695 9 by by IN 10_1101-436634 695 10 the the DT 10_1101-436634 695 11 CGC CGC NNP 10_1101-436634 695 12 as as IN 10_1101-436634 695 13 an an DT 10_1101-436634 695 14 oncogene oncogene NN 10_1101-436634 695 15 ( ( -LRB- 10_1101-436634 695 16 red red NNP 10_1101-436634 695 17 ) ) -RRB- 10_1101-436634 695 18 , , , 10_1101-436634 695 19 an an DT 10_1101-436634 695 20 oncogene oncogene NN 10_1101-436634 695 21 and and CC 10_1101-436634 695 22 tumor tumor NN 10_1101-436634 695 23 suppressor suppressor NN 10_1101-436634 695 24 gene gene NN 10_1101-436634 695 25 ( ( -LRB- 10_1101-436634 695 26 yellow yellow NNP 10_1101-436634 695 27 ) ) -RRB- 10_1101-436634 695 28 , , , 10_1101-436634 695 29 a a DT 10_1101-436634 695 30 tumor tumor NN 10_1101-436634 695 31 suppressor suppressor NN 10_1101-436634 695 32 gene gene NN 10_1101-436634 695 33 ( ( -LRB- 10_1101-436634 695 34 green green NNP 10_1101-436634 695 35 ) ) -RRB- 10_1101-436634 695 36 , , , 10_1101-436634 695 37 or or CC 10_1101-436634 695 38 .CC .CC NFP 10_1101-436634 695 39 - - HYPH 10_1101-436634 695 40 BY by IN 10_1101-436634 695 41 - - HYPH 10_1101-436634 695 42 NC NC NNP 10_1101-436634 695 43 - - HYPH 10_1101-436634 695 44 ND ND NNP 10_1101-436634 695 45 4.0 4.0 CD 10_1101-436634 695 46 International international JJ 10_1101-436634 695 47 licensea licensea NNS 10_1101-436634 695 48 certified certify VBN 10_1101-436634 695 49 by by IN 10_1101-436634 695 50 peer peer NN 10_1101-436634 695 51 review review NN 10_1101-436634 695 52 ) ) -RRB- 10_1101-436634 695 53 is be VBZ 10_1101-436634 695 54 the the DT 10_1101-436634 695 55 author author NN 10_1101-436634 695 56 / / SYM 10_1101-436634 695 57 funder funder NN 10_1101-436634 695 58 , , , 10_1101-436634 695 59 who who WP 10_1101-436634 695 60 has have VBZ 10_1101-436634 695 61 granted grant VBN 10_1101-436634 695 62 bioRxiv biorxiv IN 10_1101-436634 695 63 a a DT 10_1101-436634 695 64 license license NN 10_1101-436634 695 65 to to TO 10_1101-436634 695 66 display display VB 10_1101-436634 695 67 the the DT 10_1101-436634 695 68 preprint preprint NN 10_1101-436634 695 69 in in IN 10_1101-436634 695 70 perpetuity perpetuity NN 10_1101-436634 695 71 . . . 10_1101-436634 696 1 It -PRON- PRP 10_1101-436634 696 2 is be VBZ 10_1101-436634 696 3 made make VBN 10_1101-436634 696 4 available available JJ 10_1101-436634 696 5 under under IN 10_1101-436634 696 6 The the DT 10_1101-436634 696 7 copyright copyright NN 10_1101-436634 696 8 holder holder NN 10_1101-436634 696 9 for for IN 10_1101-436634 696 10 this this DT 10_1101-436634 696 11 preprint preprint NN 10_1101-436634 696 12 ( ( -LRB- 10_1101-436634 696 13 which which WDT 10_1101-436634 696 14 was be VBD 10_1101-436634 696 15 notthis notthis DT 10_1101-436634 696 16 version version NN 10_1101-436634 696 17 posted post VBN 10_1101-436634 696 18 January January NNP 10_1101-436634 696 19 5 5 CD 10_1101-436634 696 20 , , , 10_1101-436634 696 21 2021 2021 CD 10_1101-436634 696 22 . . . 10_1101-436634 696 23 ; ; : 10_1101-436634 696 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 696 25 : : : 10_1101-436634 696 26 bioRxiv biorxiv VB 10_1101-436634 696 27 preprint preprint NN 10_1101-436634 696 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 696 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NNP 10_1101-436634 696 30 31 31 CD 10_1101-436634 696 31 another another DT 10_1101-436634 696 32 type type NN 10_1101-436634 696 33 of of IN 10_1101-436634 696 34 cancer cancer NN 10_1101-436634 696 35 - - HYPH 10_1101-436634 696 36 relevant relevant JJ 10_1101-436634 696 37 gene gene NN 10_1101-436634 696 38 . . . 10_1101-436634 697 1 Shading shade VBG 10_1101-436634 697 2 corresponds correspond NNS 10_1101-436634 697 3 to to IN 10_1101-436634 697 4 −log10(p −log10(p ADD 10_1101-436634 697 5 value value NN 10_1101-436634 697 6 ) ) -RRB- 10_1101-436634 697 7 and and CC 10_1101-436634 697 8 columns column NNS 10_1101-436634 697 9 represent represent VBP 10_1101-436634 697 10 cancer cancer NN 10_1101-436634 697 11 types type NNS 10_1101-436634 697 12 . . . 10_1101-436634 698 1 Red red JJ 10_1101-436634 698 2 marks mark NNS 10_1101-436634 698 3 within within IN 10_1101-436634 698 4 cells cell NNS 10_1101-436634 698 5 indicate indicate VBP 10_1101-436634 698 6 that that IN 10_1101-436634 698 7 the the DT 10_1101-436634 698 8 gene gene NN 10_1101-436634 698 9 was be VBD 10_1101-436634 698 10 annotated annotate VBN 10_1101-436634 698 11 by by IN 10_1101-436634 698 12 CHASMplus CHASMplus NNP 10_1101-436634 698 13 as as IN 10_1101-436634 698 14 a a DT 10_1101-436634 698 15 driver driver NN 10_1101-436634 698 16 within within IN 10_1101-436634 698 17 a a DT 10_1101-436634 698 18 given give VBN 10_1101-436634 698 19 TCGA tcga NN 10_1101-436634 698 20 cohort cohort NN 10_1101-436634 698 21 . . . 10_1101-436634 699 1 .CC .CC NFP 10_1101-436634 699 2 - - : 10_1101-436634 699 3 BY by IN 10_1101-436634 699 4 - - HYPH 10_1101-436634 699 5 NC NC NNP 10_1101-436634 699 6 - - HYPH 10_1101-436634 699 7 ND ND NNP 10_1101-436634 699 8 4.0 4.0 CD 10_1101-436634 699 9 International international JJ 10_1101-436634 699 10 licensea licensea NNS 10_1101-436634 699 11 certified certify VBN 10_1101-436634 699 12 by by IN 10_1101-436634 699 13 peer peer NN 10_1101-436634 699 14 review review NN 10_1101-436634 699 15 ) ) -RRB- 10_1101-436634 699 16 is be VBZ 10_1101-436634 699 17 the the DT 10_1101-436634 699 18 author author NN 10_1101-436634 699 19 / / SYM 10_1101-436634 699 20 funder funder NN 10_1101-436634 699 21 , , , 10_1101-436634 699 22 who who WP 10_1101-436634 699 23 has have VBZ 10_1101-436634 699 24 granted grant VBN 10_1101-436634 699 25 bioRxiv biorxiv IN 10_1101-436634 699 26 a a DT 10_1101-436634 699 27 license license NN 10_1101-436634 699 28 to to TO 10_1101-436634 699 29 display display VB 10_1101-436634 699 30 the the DT 10_1101-436634 699 31 preprint preprint NN 10_1101-436634 699 32 in in IN 10_1101-436634 699 33 perpetuity perpetuity NN 10_1101-436634 699 34 . . . 10_1101-436634 700 1 It -PRON- PRP 10_1101-436634 700 2 is be VBZ 10_1101-436634 700 3 made make VBN 10_1101-436634 700 4 available available JJ 10_1101-436634 700 5 under under IN 10_1101-436634 700 6 The the DT 10_1101-436634 700 7 copyright copyright NN 10_1101-436634 700 8 holder holder NN 10_1101-436634 700 9 for for IN 10_1101-436634 700 10 this this DT 10_1101-436634 700 11 preprint preprint NN 10_1101-436634 700 12 ( ( -LRB- 10_1101-436634 700 13 which which WDT 10_1101-436634 700 14 was be VBD 10_1101-436634 700 15 notthis notthis DT 10_1101-436634 700 16 version version NN 10_1101-436634 700 17 posted post VBN 10_1101-436634 700 18 January January NNP 10_1101-436634 700 19 5 5 CD 10_1101-436634 700 20 , , , 10_1101-436634 700 21 2021 2021 CD 10_1101-436634 700 22 . . . 10_1101-436634 700 23 ; ; : 10_1101-436634 700 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 700 25 : : : 10_1101-436634 700 26 bioRxiv biorxiv VB 10_1101-436634 700 27 preprint preprint NN 10_1101-436634 700 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 700 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-436634 700 30 32 32 CD 10_1101-436634 700 31 .CC .CC NFP 10_1101-436634 700 32 - - HYPH 10_1101-436634 700 33 BY by IN 10_1101-436634 700 34 - - HYPH 10_1101-436634 700 35 NC NC NNP 10_1101-436634 700 36 - - HYPH 10_1101-436634 700 37 ND ND NNP 10_1101-436634 700 38 4.0 4.0 CD 10_1101-436634 700 39 International international JJ 10_1101-436634 700 40 licensea licensea NNS 10_1101-436634 700 41 certified certify VBN 10_1101-436634 700 42 by by IN 10_1101-436634 700 43 peer peer NN 10_1101-436634 700 44 review review NN 10_1101-436634 700 45 ) ) -RRB- 10_1101-436634 700 46 is be VBZ 10_1101-436634 700 47 the the DT 10_1101-436634 700 48 author author NN 10_1101-436634 700 49 / / SYM 10_1101-436634 700 50 funder funder NN 10_1101-436634 700 51 , , , 10_1101-436634 700 52 who who WP 10_1101-436634 700 53 has have VBZ 10_1101-436634 700 54 granted grant VBN 10_1101-436634 700 55 bioRxiv biorxiv IN 10_1101-436634 700 56 a a DT 10_1101-436634 700 57 license license NN 10_1101-436634 700 58 to to TO 10_1101-436634 700 59 display display VB 10_1101-436634 700 60 the the DT 10_1101-436634 700 61 preprint preprint NN 10_1101-436634 700 62 in in IN 10_1101-436634 700 63 perpetuity perpetuity NN 10_1101-436634 700 64 . . . 10_1101-436634 701 1 It -PRON- PRP 10_1101-436634 701 2 is be VBZ 10_1101-436634 701 3 made make VBN 10_1101-436634 701 4 available available JJ 10_1101-436634 701 5 under under IN 10_1101-436634 701 6 The the DT 10_1101-436634 701 7 copyright copyright NN 10_1101-436634 701 8 holder holder NN 10_1101-436634 701 9 for for IN 10_1101-436634 701 10 this this DT 10_1101-436634 701 11 preprint preprint NN 10_1101-436634 701 12 ( ( -LRB- 10_1101-436634 701 13 which which WDT 10_1101-436634 701 14 was be VBD 10_1101-436634 701 15 notthis notthis DT 10_1101-436634 701 16 version version NN 10_1101-436634 701 17 posted post VBN 10_1101-436634 701 18 January January NNP 10_1101-436634 701 19 5 5 CD 10_1101-436634 701 20 , , , 10_1101-436634 701 21 2021 2021 CD 10_1101-436634 701 22 . . . 10_1101-436634 701 23 ; ; : 10_1101-436634 701 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 701 25 : : : 10_1101-436634 701 26 bioRxiv biorxiv VB 10_1101-436634 701 27 preprint preprint NN 10_1101-436634 701 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 701 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 701 30 33 33 CD 10_1101-436634 701 31 Figure figure NN 10_1101-436634 701 32 6 6 CD 10_1101-436634 701 33 . . . 10_1101-436634 702 1 Several several JJ 10_1101-436634 702 2 SNVs SNVs NNPS 10_1101-436634 702 3 in in IN 10_1101-436634 702 4 B2 B2 NNP 10_1101-436634 702 5 M M NNP 10_1101-436634 702 6 associated associate VBN 10_1101-436634 702 7 with with IN 10_1101-436634 702 8 alternate alternate JJ 10_1101-436634 702 9 acceptor acceptor NN 10_1101-436634 702 10 and and CC 10_1101-436634 702 11 alternate alternate JJ 10_1101-436634 702 12 donor donor NN 10_1101-436634 702 13 usage usage NN 10_1101-436634 702 14 . . . 10_1101-436634 703 1 A a LS 10_1101-436634 703 2 ) ) -RRB- 10_1101-436634 703 3 IGV igv NN 10_1101-436634 703 4 snapshot snapshot NN 10_1101-436634 703 5 of of IN 10_1101-436634 703 6 three three CD 10_1101-436634 703 7 intronic intronic JJ 10_1101-436634 703 8 variant variant JJ 10_1101-436634 703 9 positions position NNS 10_1101-436634 703 10 found find VBD 10_1101-436634 703 11 to to TO 10_1101-436634 703 12 be be VB 10_1101-436634 703 13 associated associate VBN 10_1101-436634 703 14 with with IN 10_1101-436634 703 15 usage usage NN 10_1101-436634 703 16 of of IN 10_1101-436634 703 17 an an DT 10_1101-436634 703 18 alternate alternate JJ 10_1101-436634 703 19 acceptor acceptor NN 10_1101-436634 703 20 and and CC 10_1101-436634 703 21 alternative alternative JJ 10_1101-436634 703 22 donor donor NN 10_1101-436634 703 23 site site NN 10_1101-436634 703 24 that that WDT 10_1101-436634 703 25 leads lead VBZ 10_1101-436634 703 26 to to IN 10_1101-436634 703 27 formation formation NN 10_1101-436634 703 28 of of IN 10_1101-436634 703 29 novel novel JJ 10_1101-436634 703 30 transcript transcript NN 10_1101-436634 703 31 products product NNS 10_1101-436634 703 32 . . . 10_1101-436634 704 1 This this DT 10_1101-436634 704 2 result result NN 10_1101-436634 704 3 was be VBD 10_1101-436634 704 4 found find VBN 10_1101-436634 704 5 using use VBG 10_1101-436634 704 6 the the DT 10_1101-436634 704 7 default default NN 10_1101-436634 704 8 splice splice NN 10_1101-436634 704 9 variant variant JJ 10_1101-436634 704 10 window window NN 10_1101-436634 704 11 parameter parameter NN 10_1101-436634 704 12 ( ( -LRB- 10_1101-436634 704 13 i2e3 i2e3 ADD 10_1101-436634 704 14 ) ) -RRB- 10_1101-436634 704 15 . . . 10_1101-436634 705 1 B b NN 10_1101-436634 705 2 ) ) -RRB- 10_1101-436634 705 3 Zoomed Zoomed NNP 10_1101-436634 705 4 in in IN 10_1101-436634 705 5 view view NN 10_1101-436634 705 6 of of IN 10_1101-436634 705 7 the the DT 10_1101-436634 705 8 variants variant NNS 10_1101-436634 705 9 identified identify VBN 10_1101-436634 705 10 by by IN 10_1101-436634 705 11 RegTools regtool NNS 10_1101-436634 705 12 that that WDT 10_1101-436634 705 13 are be VBP 10_1101-436634 705 14 associated associate VBN 10_1101-436634 705 15 with with IN 10_1101-436634 705 16 alternate alternate JJ 10_1101-436634 705 17 acceptor acceptor NN 10_1101-436634 705 18 and and CC 10_1101-436634 705 19 donor donor NN 10_1101-436634 705 20 usage usage NN 10_1101-436634 705 21 . . . 10_1101-436634 706 1 Two two CD 10_1101-436634 706 2 of of IN 10_1101-436634 706 3 these these DT 10_1101-436634 706 4 variant variant JJ 10_1101-436634 706 5 positions position NNS 10_1101-436634 706 6 flank flank VBP 10_1101-436634 706 7 the the DT 10_1101-436634 706 8 acceptor acceptor NN 10_1101-436634 706 9 site site NN 10_1101-436634 706 10 and and CC 10_1101-436634 706 11 one one NN 10_1101-436634 706 12 flanks flank VBZ 10_1101-436634 706 13 the the DT 10_1101-436634 706 14 donor donor NN 10_1101-436634 706 15 site site NN 10_1101-436634 706 16 that that WDT 10_1101-436634 706 17 are be VBP 10_1101-436634 706 18 being be VBG 10_1101-436634 706 19 affected affect VBN 10_1101-436634 706 20 . . . 10_1101-436634 707 1 C C NNP 10_1101-436634 707 2 ) ) -RRB- 10_1101-436634 707 3 Sashimi Sashimi NNP 10_1101-436634 707 4 plot plot NN 10_1101-436634 707 5 visualizations visualization NNS 10_1101-436634 707 6 for for IN 10_1101-436634 707 7 samples sample NNS 10_1101-436634 707 8 containing contain VBG 10_1101-436634 707 9 the the DT 10_1101-436634 707 10 identified identify VBN 10_1101-436634 707 11 variants variant NNS 10_1101-436634 707 12 that that WDT 10_1101-436634 707 13 show show VBP 10_1101-436634 707 14 alternate alternate JJ 10_1101-436634 707 15 acceptor acceptor NN 10_1101-436634 707 16 usage usage NN 10_1101-436634 707 17 ( ( -LRB- 10_1101-436634 707 18 red red NNP 10_1101-436634 707 19 ) ) -RRB- 10_1101-436634 707 20 or or CC 10_1101-436634 707 21 alternate alternate JJ 10_1101-436634 707 22 donor donor NN 10_1101-436634 707 23 usage usage NN 10_1101-436634 707 24 ( ( -LRB- 10_1101-436634 707 25 orange orange NNP 10_1101-436634 707 26 ) ) -RRB- 10_1101-436634 707 27 . . . 10_1101-436634 708 1 .CC .CC NFP 10_1101-436634 708 2 - - : 10_1101-436634 708 3 BY by IN 10_1101-436634 708 4 - - HYPH 10_1101-436634 708 5 NC NC NNP 10_1101-436634 708 6 - - HYPH 10_1101-436634 708 7 ND ND NNP 10_1101-436634 708 8 4.0 4.0 CD 10_1101-436634 708 9 International international JJ 10_1101-436634 708 10 licensea licensea NNS 10_1101-436634 708 11 certified certify VBN 10_1101-436634 708 12 by by IN 10_1101-436634 708 13 peer peer NN 10_1101-436634 708 14 review review NN 10_1101-436634 708 15 ) ) -RRB- 10_1101-436634 708 16 is be VBZ 10_1101-436634 708 17 the the DT 10_1101-436634 708 18 author author NN 10_1101-436634 708 19 / / SYM 10_1101-436634 708 20 funder funder NN 10_1101-436634 708 21 , , , 10_1101-436634 708 22 who who WP 10_1101-436634 708 23 has have VBZ 10_1101-436634 708 24 granted grant VBN 10_1101-436634 708 25 bioRxiv biorxiv IN 10_1101-436634 708 26 a a DT 10_1101-436634 708 27 license license NN 10_1101-436634 708 28 to to TO 10_1101-436634 708 29 display display VB 10_1101-436634 708 30 the the DT 10_1101-436634 708 31 preprint preprint NN 10_1101-436634 708 32 in in IN 10_1101-436634 708 33 perpetuity perpetuity NN 10_1101-436634 708 34 . . . 10_1101-436634 709 1 It -PRON- PRP 10_1101-436634 709 2 is be VBZ 10_1101-436634 709 3 made make VBN 10_1101-436634 709 4 available available JJ 10_1101-436634 709 5 under under IN 10_1101-436634 709 6 The the DT 10_1101-436634 709 7 copyright copyright NN 10_1101-436634 709 8 holder holder NN 10_1101-436634 709 9 for for IN 10_1101-436634 709 10 this this DT 10_1101-436634 709 11 preprint preprint NN 10_1101-436634 709 12 ( ( -LRB- 10_1101-436634 709 13 which which WDT 10_1101-436634 709 14 was be VBD 10_1101-436634 709 15 notthis notthis DT 10_1101-436634 709 16 version version NN 10_1101-436634 709 17 posted post VBN 10_1101-436634 709 18 January January NNP 10_1101-436634 709 19 5 5 CD 10_1101-436634 709 20 , , , 10_1101-436634 709 21 2021 2021 CD 10_1101-436634 709 22 . . . 10_1101-436634 709 23 ; ; : 10_1101-436634 709 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 709 25 : : : 10_1101-436634 709 26 bioRxiv biorxiv VB 10_1101-436634 709 27 preprint preprint NN 10_1101-436634 709 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 709 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 709 30 34 34 CD 10_1101-436634 709 31 Supplemental Supplemental NNP 10_1101-436634 709 32 Figures Figures NNPS 10_1101-436634 709 33 Supplementary Supplementary NNP 10_1101-436634 709 34 Figure Figure NNP 10_1101-436634 709 35 1 1 CD 10_1101-436634 709 36 . . . 10_1101-436634 710 1 Benchmarking benchmarke VBG 10_1101-436634 710 2 of of IN 10_1101-436634 710 3 each each DT 10_1101-436634 710 4 RegTools RegTools NNP 10_1101-436634 710 5 command command NN 10_1101-436634 710 6 . . . 10_1101-436634 711 1 The the DT 10_1101-436634 711 2 total total JJ 10_1101-436634 711 3 CPU cpu NN 10_1101-436634 711 4 time time NN 10_1101-436634 711 5 ( ( -LRB- 10_1101-436634 711 6 System system NN 10_1101-436634 711 7 Time time NN 10_1101-436634 711 8 + + CC 10_1101-436634 711 9 User User NNP 10_1101-436634 711 10 Time Time NNP 10_1101-436634 711 11 ) ) -RRB- 10_1101-436634 711 12 and and CC 10_1101-436634 711 13 real real JJ 10_1101-436634 711 14 time time NN 10_1101-436634 711 15 are be VBP 10_1101-436634 711 16 plotted plot VBN 10_1101-436634 711 17 against against IN 10_1101-436634 711 18 the the DT 10_1101-436634 711 19 number number NN 10_1101-436634 711 20 of of IN 10_1101-436634 711 21 entries entry NNS 10_1101-436634 711 22 processed process VBN 10_1101-436634 711 23 for for IN 10_1101-436634 711 24 each each DT 10_1101-436634 711 25 available available JJ 10_1101-436634 711 26 RegTools RegTools NNP 10_1101-436634 711 27 function function NN 10_1101-436634 711 28 using use VBG 10_1101-436634 711 29 10 10 CD 10_1101-436634 711 30 total total JJ 10_1101-436634 711 31 replicates replicate NNS 10_1101-436634 711 32 . . . 10_1101-436634 712 1 For for IN 10_1101-436634 712 2 the the DT 10_1101-436634 712 3 cis- cis- NN 10_1101-436634 712 4 splice splice NN 10_1101-436634 712 5 - - HYPH 10_1101-436634 712 6 effects effect NNS 10_1101-436634 712 7 identify identify VBP 10_1101-436634 712 8 / / SYM 10_1101-436634 712 9 cis cis NN 10_1101-436634 712 10 - - HYPH 10_1101-436634 712 11 splice splice NN 10_1101-436634 712 12 - - HYPH 10_1101-436634 712 13 effects effect NNS 10_1101-436634 712 14 associate associate NN 10_1101-436634 712 15 / / SYM 10_1101-436634 712 16 variants variant NNS 10_1101-436634 712 17 annotate annotate VBP 10_1101-436634 712 18 workflows workflow NNS 10_1101-436634 712 19 , , , 10_1101-436634 712 20 the the DT 10_1101-436634 712 21 number number NN 10_1101-436634 712 22 of of IN 10_1101-436634 712 23 entries entry NNS 10_1101-436634 712 24 corresponds correspond NNS 10_1101-436634 712 25 to to IN 10_1101-436634 712 26 the the DT 10_1101-436634 712 27 number number NN 10_1101-436634 712 28 of of IN 10_1101-436634 712 29 somatic somatic JJ 10_1101-436634 712 30 variants variant NNS 10_1101-436634 712 31 , , , 10_1101-436634 712 32 whereas whereas IN 10_1101-436634 712 33 the the DT 10_1101-436634 712 34 number number NN 10_1101-436634 712 35 of of IN 10_1101-436634 712 36 entries entry NNS 10_1101-436634 712 37 in in IN 10_1101-436634 712 38 the the DT 10_1101-436634 712 39 junctions junction NNS 10_1101-436634 712 40 extract extract NNP 10_1101-436634 712 41 / / SYM 10_1101-436634 712 42 junctions junction NNS 10_1101-436634 712 43 annotate annotate NN 10_1101-436634 712 44 / / SYM 10_1101-436634 712 45 compare_junctions compare_junction NNS 10_1101-436634 712 46 workflows workflow VBZ 10_1101-436634 712 47 corresponds correspond NNS 10_1101-436634 712 48 to to IN 10_1101-436634 712 49 the the DT 10_1101-436634 712 50 number number NN 10_1101-436634 712 51 of of IN 10_1101-436634 712 52 reads read NNS 10_1101-436634 712 53 processed process VBN 10_1101-436634 712 54 from from IN 10_1101-436634 712 55 a a DT 10_1101-436634 712 56 downsampled downsampled JJ 10_1101-436634 712 57 BAM bam NN 10_1101-436634 712 58 file file NN 10_1101-436634 712 59 , , , 10_1101-436634 712 60 the the DT 10_1101-436634 712 61 number number NN 10_1101-436634 712 62 of of IN 10_1101-436634 712 63 junctions junction NNS 10_1101-436634 712 64 processed process VBN 10_1101-436634 712 65 , , , 10_1101-436634 712 66 and and CC 10_1101-436634 712 67 the the DT 10_1101-436634 712 68 number number NN 10_1101-436634 712 69 of of IN 10_1101-436634 712 70 candidate candidate NN 10_1101-436634 712 71 variant variant JJ 10_1101-436634 712 72 junction junction NN 10_1101-436634 712 73 pairings pairing NNS 10_1101-436634 712 74 processed process VBN 10_1101-436634 712 75 , , , 10_1101-436634 712 76 respectively respectively RB 10_1101-436634 712 77 . . . 10_1101-436634 713 1 For for IN 10_1101-436634 713 2 compare_junctions compare_junction NNS 10_1101-436634 713 3 , , , 10_1101-436634 713 4 candidate candidate VBP 10_1101-436634 713 5 variant variant JJ 10_1101-436634 713 6 junction junction NN 10_1101-436634 713 7 pairings pairing NNS 10_1101-436634 713 8 were be VBD 10_1101-436634 713 9 compared compare VBN 10_1101-436634 713 10 across across IN 10_1101-436634 713 11 the the DT 10_1101-436634 713 12 number number NN 10_1101-436634 713 13 of of IN 10_1101-436634 713 14 samples sample NNS 10_1101-436634 713 15 in in IN 10_1101-436634 713 16 that that DT 10_1101-436634 713 17 cohort cohort NN 10_1101-436634 713 18 , , , 10_1101-436634 713 19 with with IN 10_1101-436634 713 20 the the DT 10_1101-436634 713 21 largest large JJS 10_1101-436634 713 22 being be VBG 10_1101-436634 713 23 1022 1022 CD 10_1101-436634 713 24 samples sample NNS 10_1101-436634 713 25 that that WDT 10_1101-436634 713 26 comprise comprise VBP 10_1101-436634 713 27 our -PRON- PRP$ 10_1101-436634 713 28 BRCA BRCA NNP 10_1101-436634 713 29 cohort cohort NN 10_1101-436634 713 30 . . . 10_1101-436634 714 1 LOESS LOESS NNP 10_1101-436634 714 2 curves curve NNS 10_1101-436634 714 3 are be VBP 10_1101-436634 714 4 fitted fit VBN 10_1101-436634 714 5 onto onto IN 10_1101-436634 714 6 each each DT 10_1101-436634 714 7 plot plot NN 10_1101-436634 714 8 . . . 10_1101-436634 715 1 34 34 CD 10_1101-436634 715 2 of of IN 10_1101-436634 715 3 rt rt NNP 10_1101-436634 715 4 , , , 10_1101-436634 715 5 .CC .CC : 10_1101-436634 715 6 - - : 10_1101-436634 715 7 BY by IN 10_1101-436634 715 8 - - HYPH 10_1101-436634 715 9 NC NC NNP 10_1101-436634 715 10 - - HYPH 10_1101-436634 715 11 ND ND NNP 10_1101-436634 715 12 4.0 4.0 CD 10_1101-436634 715 13 International international JJ 10_1101-436634 715 14 licensea licensea NNS 10_1101-436634 715 15 certified certify VBN 10_1101-436634 715 16 by by IN 10_1101-436634 715 17 peer peer NN 10_1101-436634 715 18 review review NN 10_1101-436634 715 19 ) ) -RRB- 10_1101-436634 715 20 is be VBZ 10_1101-436634 715 21 the the DT 10_1101-436634 715 22 author author NN 10_1101-436634 715 23 / / SYM 10_1101-436634 715 24 funder funder NN 10_1101-436634 715 25 , , , 10_1101-436634 715 26 who who WP 10_1101-436634 715 27 has have VBZ 10_1101-436634 715 28 granted grant VBN 10_1101-436634 715 29 bioRxiv biorxiv IN 10_1101-436634 715 30 a a DT 10_1101-436634 715 31 license license NN 10_1101-436634 715 32 to to TO 10_1101-436634 715 33 display display VB 10_1101-436634 715 34 the the DT 10_1101-436634 715 35 preprint preprint NN 10_1101-436634 715 36 in in IN 10_1101-436634 715 37 perpetuity perpetuity NN 10_1101-436634 715 38 . . . 10_1101-436634 716 1 It -PRON- PRP 10_1101-436634 716 2 is be VBZ 10_1101-436634 716 3 made make VBN 10_1101-436634 716 4 available available JJ 10_1101-436634 716 5 under under IN 10_1101-436634 716 6 The the DT 10_1101-436634 716 7 copyright copyright NN 10_1101-436634 716 8 holder holder NN 10_1101-436634 716 9 for for IN 10_1101-436634 716 10 this this DT 10_1101-436634 716 11 preprint preprint NN 10_1101-436634 716 12 ( ( -LRB- 10_1101-436634 716 13 which which WDT 10_1101-436634 716 14 was be VBD 10_1101-436634 716 15 notthis notthis DT 10_1101-436634 716 16 version version NN 10_1101-436634 716 17 posted post VBN 10_1101-436634 716 18 January January NNP 10_1101-436634 716 19 5 5 CD 10_1101-436634 716 20 , , , 10_1101-436634 716 21 2021 2021 CD 10_1101-436634 716 22 . . . 10_1101-436634 716 23 ; ; : 10_1101-436634 716 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 716 25 : : : 10_1101-436634 716 26 bioRxiv biorxiv VB 10_1101-436634 716 27 preprint preprint NN 10_1101-436634 716 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 716 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 716 30 35 35 CD 10_1101-436634 716 31 Supplementary Supplementary NNP 10_1101-436634 716 32 Figure Figure NNP 10_1101-436634 716 33 2 2 CD 10_1101-436634 716 34 . . . 10_1101-436634 717 1 Summary summary NN 10_1101-436634 717 2 of of IN 10_1101-436634 717 3 variants variant NNS 10_1101-436634 717 4 analyzed analyze VBN 10_1101-436634 717 5 by by IN 10_1101-436634 717 6 RegTools regtool NNS 10_1101-436634 717 7 in in IN 10_1101-436634 717 8 each each DT 10_1101-436634 717 9 tumor tumor NN 10_1101-436634 717 10 cohort cohort NN 10_1101-436634 717 11 Summary Summary NNP 10_1101-436634 717 12 of of IN 10_1101-436634 717 13 the the DT 10_1101-436634 717 14 starting start VBG 10_1101-436634 717 15 number number NN 10_1101-436634 717 16 of of IN 10_1101-436634 717 17 high high JJ 10_1101-436634 717 18 quality quality NN 10_1101-436634 717 19 variants variant NNS 10_1101-436634 717 20 per per IN 10_1101-436634 717 21 sample sample NN 10_1101-436634 717 22 , , , 10_1101-436634 717 23 the the DT 10_1101-436634 717 24 number number NN 10_1101-436634 717 25 of of IN 10_1101-436634 717 26 initial initial JJ 10_1101-436634 717 27 variants variant NNS 10_1101-436634 717 28 considered consider VBN 10_1101-436634 717 29 for for IN 10_1101-436634 717 30 analysis analysis NN 10_1101-436634 717 31 by by IN 10_1101-436634 717 32 RegTools regtool NNS 10_1101-436634 717 33 for for IN 10_1101-436634 717 34 each each DT 10_1101-436634 717 35 variant variant JJ 10_1101-436634 717 36 window window NN 10_1101-436634 717 37 used use VBN 10_1101-436634 717 38 per per IN 10_1101-436634 717 39 tumor tumor NN 10_1101-436634 717 40 cohort cohort NN 10_1101-436634 717 41 , , , 10_1101-436634 717 42 and and CC 10_1101-436634 717 43 the the DT 10_1101-436634 717 44 number number NN 10_1101-436634 717 45 of of IN 10_1101-436634 717 46 significant significant JJ 10_1101-436634 717 47 variants variant NNS 10_1101-436634 717 48 for for IN 10_1101-436634 717 49 each each DT 10_1101-436634 717 50 variant variant JJ 10_1101-436634 717 51 window window NN 10_1101-436634 717 52 used use VBN 10_1101-436634 717 53 per per IN 10_1101-436634 717 54 tumor tumor NN 10_1101-436634 717 55 cohort cohort NN 10_1101-436634 717 56 . . . 10_1101-436634 718 1 .CC .CC NFP 10_1101-436634 718 2 - - : 10_1101-436634 718 3 BY by IN 10_1101-436634 718 4 - - HYPH 10_1101-436634 718 5 NC NC NNP 10_1101-436634 718 6 - - HYPH 10_1101-436634 718 7 ND ND NNP 10_1101-436634 718 8 4.0 4.0 CD 10_1101-436634 718 9 International international JJ 10_1101-436634 718 10 licensea licensea NNS 10_1101-436634 718 11 certified certify VBN 10_1101-436634 718 12 by by IN 10_1101-436634 718 13 peer peer NN 10_1101-436634 718 14 review review NN 10_1101-436634 718 15 ) ) -RRB- 10_1101-436634 718 16 is be VBZ 10_1101-436634 718 17 the the DT 10_1101-436634 718 18 author author NN 10_1101-436634 718 19 / / SYM 10_1101-436634 718 20 funder funder NN 10_1101-436634 718 21 , , , 10_1101-436634 718 22 who who WP 10_1101-436634 718 23 has have VBZ 10_1101-436634 718 24 granted grant VBN 10_1101-436634 718 25 bioRxiv biorxiv IN 10_1101-436634 718 26 a a DT 10_1101-436634 718 27 license license NN 10_1101-436634 718 28 to to TO 10_1101-436634 718 29 display display VB 10_1101-436634 718 30 the the DT 10_1101-436634 718 31 preprint preprint NN 10_1101-436634 718 32 in in IN 10_1101-436634 718 33 perpetuity perpetuity NN 10_1101-436634 718 34 . . . 10_1101-436634 719 1 It -PRON- PRP 10_1101-436634 719 2 is be VBZ 10_1101-436634 719 3 made make VBN 10_1101-436634 719 4 available available JJ 10_1101-436634 719 5 under under IN 10_1101-436634 719 6 The the DT 10_1101-436634 719 7 copyright copyright NN 10_1101-436634 719 8 holder holder NN 10_1101-436634 719 9 for for IN 10_1101-436634 719 10 this this DT 10_1101-436634 719 11 preprint preprint NN 10_1101-436634 719 12 ( ( -LRB- 10_1101-436634 719 13 which which WDT 10_1101-436634 719 14 was be VBD 10_1101-436634 719 15 notthis notthis DT 10_1101-436634 719 16 version version NN 10_1101-436634 719 17 posted post VBN 10_1101-436634 719 18 January January NNP 10_1101-436634 719 19 5 5 CD 10_1101-436634 719 20 , , , 10_1101-436634 719 21 2021 2021 CD 10_1101-436634 719 22 . . . 10_1101-436634 719 23 ; ; : 10_1101-436634 719 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 719 25 : : : 10_1101-436634 719 26 bioRxiv biorxiv VB 10_1101-436634 719 27 preprint preprint NN 10_1101-436634 719 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 719 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CC 10_1101-436634 719 30 36 36 CD 10_1101-436634 719 31 Supplementary Supplementary NNP 10_1101-436634 719 32 Figure Figure NNP 10_1101-436634 719 33 3 3 CD 10_1101-436634 719 34 . . . 10_1101-436634 720 1 Visualization visualization NN 10_1101-436634 720 2 of of IN 10_1101-436634 720 3 junctions junction NNS 10_1101-436634 720 4 across across IN 10_1101-436634 720 5 cohorts cohort NNS 10_1101-436634 720 6 . . . 10_1101-436634 721 1 Summary summary NN 10_1101-436634 721 2 of of IN 10_1101-436634 721 3 the the DT 10_1101-436634 721 4 total total JJ 10_1101-436634 721 5 junction junction NN 10_1101-436634 721 6 read read VBD 10_1101-436634 721 7 counts count NNS 10_1101-436634 721 8 , , , 10_1101-436634 721 9 unique unique JJ 10_1101-436634 721 10 junctions junction NNS 10_1101-436634 721 11 ( ( -LRB- 10_1101-436634 721 12 all all DT 10_1101-436634 721 13 types type NNS 10_1101-436634 721 14 ) ) -RRB- 10_1101-436634 721 15 , , , 10_1101-436634 721 16 unique unique JJ 10_1101-436634 721 17 known know VBN 10_1101-436634 721 18 ( ( -LRB- 10_1101-436634 721 19 DA DA NNP 10_1101-436634 721 20 ) ) -RRB- 10_1101-436634 721 21 junctions junction NNS 10_1101-436634 721 22 , , , 10_1101-436634 721 23 unique unique JJ 10_1101-436634 721 24 known know VBN 10_1101-436634 721 25 ( ( -LRB- 10_1101-436634 721 26 DA DA NNP 10_1101-436634 721 27 ) ) -RRB- 10_1101-436634 721 28 junctions junction NNS 10_1101-436634 721 29 not not RB 10_1101-436634 721 30 found find VBN 10_1101-436634 721 31 in in IN 10_1101-436634 721 32 GTEx GTEx NNS 10_1101-436634 721 33 , , , 10_1101-436634 721 34 unique unique JJ 10_1101-436634 721 35 D d NN 10_1101-436634 721 36 , , , 10_1101-436634 721 37 A A NNP 10_1101-436634 721 38 , , , 10_1101-436634 721 39 NDA NDA NNP 10_1101-436634 721 40 junctions junction NNS 10_1101-436634 721 41 , , , 10_1101-436634 721 42 and and CC 10_1101-436634 721 43 unique unique JJ 10_1101-436634 721 44 D d NN 10_1101-436634 721 45 , , , 10_1101-436634 721 46 A A NNP 10_1101-436634 721 47 , , , 10_1101-436634 721 48 NDA NDA NNP 10_1101-436634 721 49 junctions junction NNS 10_1101-436634 721 50 not not RB 10_1101-436634 721 51 found find VBN 10_1101-436634 721 52 in in IN 10_1101-436634 721 53 GTEx GTEx NNS 10_1101-436634 721 54 per per IN 10_1101-436634 721 55 sample sample NN 10_1101-436634 721 56 per per IN 10_1101-436634 721 57 cohort cohort NN 10_1101-436634 721 58 . . . 10_1101-436634 722 1 .CC .CC NFP 10_1101-436634 722 2 - - : 10_1101-436634 722 3 BY by IN 10_1101-436634 722 4 - - HYPH 10_1101-436634 722 5 NC NC NNP 10_1101-436634 722 6 - - HYPH 10_1101-436634 722 7 ND ND NNP 10_1101-436634 722 8 4.0 4.0 CD 10_1101-436634 722 9 International international JJ 10_1101-436634 722 10 licensea licensea NNS 10_1101-436634 722 11 certified certify VBN 10_1101-436634 722 12 by by IN 10_1101-436634 722 13 peer peer NN 10_1101-436634 722 14 review review NN 10_1101-436634 722 15 ) ) -RRB- 10_1101-436634 722 16 is be VBZ 10_1101-436634 722 17 the the DT 10_1101-436634 722 18 author author NN 10_1101-436634 722 19 / / SYM 10_1101-436634 722 20 funder funder NN 10_1101-436634 722 21 , , , 10_1101-436634 722 22 who who WP 10_1101-436634 722 23 has have VBZ 10_1101-436634 722 24 granted grant VBN 10_1101-436634 722 25 bioRxiv biorxiv IN 10_1101-436634 722 26 a a DT 10_1101-436634 722 27 license license NN 10_1101-436634 722 28 to to TO 10_1101-436634 722 29 display display VB 10_1101-436634 722 30 the the DT 10_1101-436634 722 31 preprint preprint NN 10_1101-436634 722 32 in in IN 10_1101-436634 722 33 perpetuity perpetuity NN 10_1101-436634 722 34 . . . 10_1101-436634 723 1 It -PRON- PRP 10_1101-436634 723 2 is be VBZ 10_1101-436634 723 3 made make VBN 10_1101-436634 723 4 available available JJ 10_1101-436634 723 5 under under IN 10_1101-436634 723 6 The the DT 10_1101-436634 723 7 copyright copyright NN 10_1101-436634 723 8 holder holder NN 10_1101-436634 723 9 for for IN 10_1101-436634 723 10 this this DT 10_1101-436634 723 11 preprint preprint NN 10_1101-436634 723 12 ( ( -LRB- 10_1101-436634 723 13 which which WDT 10_1101-436634 723 14 was be VBD 10_1101-436634 723 15 notthis notthis DT 10_1101-436634 723 16 version version NN 10_1101-436634 723 17 posted post VBN 10_1101-436634 723 18 January January NNP 10_1101-436634 723 19 5 5 CD 10_1101-436634 723 20 , , , 10_1101-436634 723 21 2021 2021 CD 10_1101-436634 723 22 . . . 10_1101-436634 723 23 ; ; : 10_1101-436634 723 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 723 25 : : : 10_1101-436634 723 26 bioRxiv biorxiv VB 10_1101-436634 723 27 preprint preprint NN 10_1101-436634 723 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 723 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 723 30 37 37 CD 10_1101-436634 723 31 Supplementary Supplementary NNP 10_1101-436634 723 32 Figure Figure NNP 10_1101-436634 723 33 4 4 CD 10_1101-436634 723 34 : : : 10_1101-436634 723 35 Intronic intronic JJ 10_1101-436634 723 36 SNV SNV NNP 10_1101-436634 723 37 in in IN 10_1101-436634 723 38 CTTN CTTN NNP 10_1101-436634 723 39 associated associate VBN 10_1101-436634 723 40 with with IN 10_1101-436634 723 41 an an DT 10_1101-436634 723 42 exon exon JJ 10_1101-436634 723 43 skipping skipping NN 10_1101-436634 723 44 event event NN 10_1101-436634 723 45 . . . 10_1101-436634 724 1 A a LS 10_1101-436634 724 2 ) ) -RRB- 10_1101-436634 724 3 IGV igv NN 10_1101-436634 724 4 snapshot snapshot NN 10_1101-436634 724 5 of of IN 10_1101-436634 724 6 a a DT 10_1101-436634 724 7 single single JJ 10_1101-436634 724 8 nucleotide nucleotide JJ 10_1101-436634 724 9 variant variant NN 10_1101-436634 724 10 ( ( -LRB- 10_1101-436634 724 11 GRCh38 grch38 NN 10_1101-436634 724 12 , , , 10_1101-436634 724 13 chr11 chr11 NN 10_1101-436634 724 14 : : : 10_1101-436634 724 15 g.70407517G g.70407517g ADD 10_1101-436634 724 16 > > XX 10_1101-436634 724 17 C C NNP 10_1101-436634 724 18 ) ) -RRB- 10_1101-436634 724 19 within within IN 10_1101-436634 724 20 an an DT 10_1101-436634 724 21 intron intron NN 10_1101-436634 724 22 of of IN 10_1101-436634 724 23 CTTN CTTN NNP 10_1101-436634 724 24 in in IN 10_1101-436634 724 25 LUAD LUAD NNP 10_1101-436634 724 26 sample sample NN 10_1101-436634 724 27 TCGA-86 TCGA-86 NNP 10_1101-436634 724 28 - - HYPH 10_1101-436634 724 29 6851 6851 CD 10_1101-436634 724 30 - - HYPH 10_1101-436634 724 31 01A. 01a. CD 10_1101-436634 725 1 This this DT 10_1101-436634 725 2 variant variant NN 10_1101-436634 725 3 is be VBZ 10_1101-436634 725 4 associated associate VBN 10_1101-436634 725 5 with with IN 10_1101-436634 725 6 an an DT 10_1101-436634 725 7 exon exon JJ 10_1101-436634 725 8 skipping skipping NN 10_1101-436634 725 9 event event NN 10_1101-436634 725 10 causing cause VBG 10_1101-436634 725 11 the the DT 10_1101-436634 725 12 formation formation NN 10_1101-436634 725 13 of of IN 10_1101-436634 725 14 an an DT 10_1101-436634 725 15 NDA NDA NNP 10_1101-436634 725 16 junction junction NN 10_1101-436634 725 17 , , , 10_1101-436634 725 18 JUNC00027688 JUNC00027688 NNP 10_1101-436634 725 19 , , , 10_1101-436634 725 20 which which WDT 10_1101-436634 725 21 has have VBZ 10_1101-436634 725 22 44 44 CD 10_1101-436634 725 23 reads read NNS 10_1101-436634 725 24 of of IN 10_1101-436634 725 25 support support NN 10_1101-436634 725 26 . . . 10_1101-436634 726 1 The the DT 10_1101-436634 726 2 variant variant NN 10_1101-436634 726 3 was be VBD 10_1101-436634 726 4 identified identify VBN 10_1101-436634 726 5 by by IN 10_1101-436634 726 6 RegTools RegTools NNP 10_1101-436634 726 7 , , , 10_1101-436634 726 8 VEP VEP NNP 10_1101-436634 726 9 , , , 10_1101-436634 726 10 and and CC 10_1101-436634 726 11 Veridical veridical JJ 10_1101-436634 726 12 but but CC 10_1101-436634 726 13 no no DT 10_1101-436634 726 14 other other JJ 10_1101-436634 726 15 tools tool NNS 10_1101-436634 726 16 . . . 10_1101-436634 727 1 This this DT 10_1101-436634 727 2 result result NN 10_1101-436634 727 3 was be VBD 10_1101-436634 727 4 found find VBN 10_1101-436634 727 5 using use VBG 10_1101-436634 727 6 the the DT 10_1101-436634 727 7 default default NN 10_1101-436634 727 8 splice splice NN 10_1101-436634 727 9 variant variant JJ 10_1101-436634 727 10 window window NN 10_1101-436634 727 11 parameter parameter NN 10_1101-436634 727 12 ( ( -LRB- 10_1101-436634 727 13 i2e3 i2e3 ADD 10_1101-436634 727 14 ) ) -RRB- 10_1101-436634 727 15 . . . 10_1101-436634 728 1 B b NN 10_1101-436634 728 2 ) ) -RRB- 10_1101-436634 728 3 Sashimi Sashimi NNP 10_1101-436634 728 4 plot plot NN 10_1101-436634 728 5 visualization visualization NN 10_1101-436634 728 6 of of IN 10_1101-436634 728 7 the the DT 10_1101-436634 728 8 novel novel JJ 10_1101-436634 728 9 junction junction NN 10_1101-436634 728 10 . . . 10_1101-436634 729 1 .CC .CC NFP 10_1101-436634 729 2 - - : 10_1101-436634 729 3 BY by IN 10_1101-436634 729 4 - - HYPH 10_1101-436634 729 5 NC NC NNP 10_1101-436634 729 6 - - HYPH 10_1101-436634 729 7 ND ND NNP 10_1101-436634 729 8 4.0 4.0 CD 10_1101-436634 729 9 International international JJ 10_1101-436634 729 10 licensea licensea NNS 10_1101-436634 729 11 certified certify VBN 10_1101-436634 729 12 by by IN 10_1101-436634 729 13 peer peer NN 10_1101-436634 729 14 review review NN 10_1101-436634 729 15 ) ) -RRB- 10_1101-436634 729 16 is be VBZ 10_1101-436634 729 17 the the DT 10_1101-436634 729 18 author author NN 10_1101-436634 729 19 / / SYM 10_1101-436634 729 20 funder funder NN 10_1101-436634 729 21 , , , 10_1101-436634 729 22 who who WP 10_1101-436634 729 23 has have VBZ 10_1101-436634 729 24 granted grant VBN 10_1101-436634 729 25 bioRxiv biorxiv IN 10_1101-436634 729 26 a a DT 10_1101-436634 729 27 license license NN 10_1101-436634 729 28 to to TO 10_1101-436634 729 29 display display VB 10_1101-436634 729 30 the the DT 10_1101-436634 729 31 preprint preprint NN 10_1101-436634 729 32 in in IN 10_1101-436634 729 33 perpetuity perpetuity NN 10_1101-436634 729 34 . . . 10_1101-436634 730 1 It -PRON- PRP 10_1101-436634 730 2 is be VBZ 10_1101-436634 730 3 made make VBN 10_1101-436634 730 4 available available JJ 10_1101-436634 730 5 under under IN 10_1101-436634 730 6 The the DT 10_1101-436634 730 7 copyright copyright NN 10_1101-436634 730 8 holder holder NN 10_1101-436634 730 9 for for IN 10_1101-436634 730 10 this this DT 10_1101-436634 730 11 preprint preprint NN 10_1101-436634 730 12 ( ( -LRB- 10_1101-436634 730 13 which which WDT 10_1101-436634 730 14 was be VBD 10_1101-436634 730 15 notthis notthis DT 10_1101-436634 730 16 version version NN 10_1101-436634 730 17 posted post VBN 10_1101-436634 730 18 January January NNP 10_1101-436634 730 19 5 5 CD 10_1101-436634 730 20 , , , 10_1101-436634 730 21 2021 2021 CD 10_1101-436634 730 22 . . . 10_1101-436634 730 23 ; ; : 10_1101-436634 730 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 730 25 : : : 10_1101-436634 730 26 bioRxiv biorxiv VB 10_1101-436634 730 27 preprint preprint NN 10_1101-436634 730 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 730 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 730 30 38 38 CD 10_1101-436634 730 31 Supplementary Supplementary NNP 10_1101-436634 730 32 Figure figure NN 10_1101-436634 730 33 5 5 CD 10_1101-436634 730 34 : : : 10_1101-436634 730 35 Exonic exonic JJ 10_1101-436634 730 36 SNV SNV NNP 10_1101-436634 730 37 in in IN 10_1101-436634 730 38 LZTR1 lztr1 NN 10_1101-436634 730 39 associated associate VBN 10_1101-436634 730 40 with with IN 10_1101-436634 730 41 alternative alternative JJ 10_1101-436634 730 42 donor donor NN 10_1101-436634 730 43 usage usage NN 10_1101-436634 730 44 . . . 10_1101-436634 731 1 A a LS 10_1101-436634 731 2 ) ) -RRB- 10_1101-436634 731 3 IGV igv NN 10_1101-436634 731 4 snapshot snapshot NN 10_1101-436634 731 5 of of IN 10_1101-436634 731 6 a a DT 10_1101-436634 731 7 single single JJ 10_1101-436634 731 8 nucleotide nucleotide JJ 10_1101-436634 731 9 variant variant NN 10_1101-436634 731 10 ( ( -LRB- 10_1101-436634 731 11 GRCh38 GRCh38 NNP 10_1101-436634 731 12 , , , 10_1101-436634 731 13 chr22 chr22 NNS 10_1101-436634 731 14 : : : 10_1101-436634 731 15 g.20995026G g.20995026G NNP 10_1101-436634 731 16 > > XX 10_1101-436634 731 17 C C NNP 10_1101-436634 731 18 ) ) -RRB- 10_1101-436634 731 19 within within IN 10_1101-436634 731 20 an an DT 10_1101-436634 731 21 exon exon NN 10_1101-436634 731 22 of of IN 10_1101-436634 731 23 LZTR1 lztr1 NN 10_1101-436634 731 24 in in IN 10_1101-436634 731 25 LUAD LUAD NNP 10_1101-436634 731 26 sample sample NN 10_1101-436634 731 27 TCGA-38 tcga-38 JJ 10_1101-436634 731 28 - - HYPH 10_1101-436634 731 29 4631 4631 CD 10_1101-436634 731 30 - - HYPH 10_1101-436634 731 31 01A. 01A. . 10_1101-436634 732 1 This this DT 10_1101-436634 732 2 variant variant NN 10_1101-436634 732 3 is be VBZ 10_1101-436634 732 4 associated associate VBN 10_1101-436634 732 5 with with IN 10_1101-436634 732 6 the the DT 10_1101-436634 732 7 formation formation NN 10_1101-436634 732 8 of of IN 10_1101-436634 732 9 an an DT 10_1101-436634 732 10 A A NNP 10_1101-436634 732 11 junction junction NN 10_1101-436634 732 12 , , , 10_1101-436634 732 13 JUNC00075013 JUNC00075013 NNP 10_1101-436634 732 14 , , , 10_1101-436634 732 15 which which WDT 10_1101-436634 732 16 has have VBZ 10_1101-436634 732 17 49 49 CD 10_1101-436634 732 18 reads read NNS 10_1101-436634 732 19 of of IN 10_1101-436634 732 20 support support NN 10_1101-436634 732 21 . . . 10_1101-436634 733 1 The the DT 10_1101-436634 733 2 variant variant NN 10_1101-436634 733 3 was be VBD 10_1101-436634 733 4 identified identify VBN 10_1101-436634 733 5 by by IN 10_1101-436634 733 6 RegTools RegTools NNP 10_1101-436634 733 7 , , , 10_1101-436634 733 8 VEP VEP NNP 10_1101-436634 733 9 , , , 10_1101-436634 733 10 and and CC 10_1101-436634 733 11 SpliceAI SpliceAI NNP 10_1101-436634 733 12 but but CC 10_1101-436634 733 13 no no DT 10_1101-436634 733 14 other other JJ 10_1101-436634 733 15 tools tool NNS 10_1101-436634 733 16 . . . 10_1101-436634 734 1 This this DT 10_1101-436634 734 2 result result NN 10_1101-436634 734 3 was be VBD 10_1101-436634 734 4 found find VBN 10_1101-436634 734 5 using use VBG 10_1101-436634 734 6 the the DT 10_1101-436634 734 7 default default NN 10_1101-436634 734 8 splice splice NN 10_1101-436634 734 9 variant variant JJ 10_1101-436634 734 10 window window NN 10_1101-436634 734 11 parameter parameter NN 10_1101-436634 734 12 ( ( -LRB- 10_1101-436634 734 13 i2e3 i2e3 ADD 10_1101-436634 734 14 ) ) -RRB- 10_1101-436634 734 15 . . . 10_1101-436634 735 1 B b NN 10_1101-436634 735 2 ) ) -RRB- 10_1101-436634 735 3 Sashimi Sashimi NNP 10_1101-436634 735 4 plot plot NN 10_1101-436634 735 5 visualization visualization NN 10_1101-436634 735 6 of of IN 10_1101-436634 735 7 the the DT 10_1101-436634 735 8 novel novel JJ 10_1101-436634 735 9 junction junction NN 10_1101-436634 735 10 . . . 10_1101-436634 736 1 .CC .CC NFP 10_1101-436634 736 2 - - : 10_1101-436634 736 3 BY by IN 10_1101-436634 736 4 - - HYPH 10_1101-436634 736 5 NC NC NNP 10_1101-436634 736 6 - - HYPH 10_1101-436634 736 7 ND ND NNP 10_1101-436634 736 8 4.0 4.0 CD 10_1101-436634 736 9 International international JJ 10_1101-436634 736 10 licensea licensea NNS 10_1101-436634 736 11 certified certify VBN 10_1101-436634 736 12 by by IN 10_1101-436634 736 13 peer peer NN 10_1101-436634 736 14 review review NN 10_1101-436634 736 15 ) ) -RRB- 10_1101-436634 736 16 is be VBZ 10_1101-436634 736 17 the the DT 10_1101-436634 736 18 author author NN 10_1101-436634 736 19 / / SYM 10_1101-436634 736 20 funder funder NN 10_1101-436634 736 21 , , , 10_1101-436634 736 22 who who WP 10_1101-436634 736 23 has have VBZ 10_1101-436634 736 24 granted grant VBN 10_1101-436634 736 25 bioRxiv biorxiv IN 10_1101-436634 736 26 a a DT 10_1101-436634 736 27 license license NN 10_1101-436634 736 28 to to TO 10_1101-436634 736 29 display display VB 10_1101-436634 736 30 the the DT 10_1101-436634 736 31 preprint preprint NN 10_1101-436634 736 32 in in IN 10_1101-436634 736 33 perpetuity perpetuity NN 10_1101-436634 736 34 . . . 10_1101-436634 737 1 It -PRON- PRP 10_1101-436634 737 2 is be VBZ 10_1101-436634 737 3 made make VBN 10_1101-436634 737 4 available available JJ 10_1101-436634 737 5 under under IN 10_1101-436634 737 6 The the DT 10_1101-436634 737 7 copyright copyright NN 10_1101-436634 737 8 holder holder NN 10_1101-436634 737 9 for for IN 10_1101-436634 737 10 this this DT 10_1101-436634 737 11 preprint preprint NN 10_1101-436634 737 12 ( ( -LRB- 10_1101-436634 737 13 which which WDT 10_1101-436634 737 14 was be VBD 10_1101-436634 737 15 notthis notthis DT 10_1101-436634 737 16 version version NN 10_1101-436634 737 17 posted post VBN 10_1101-436634 737 18 January January NNP 10_1101-436634 737 19 5 5 CD 10_1101-436634 737 20 , , , 10_1101-436634 737 21 2021 2021 CD 10_1101-436634 737 22 . . . 10_1101-436634 737 23 ; ; : 10_1101-436634 737 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 737 25 : : : 10_1101-436634 737 26 bioRxiv biorxiv VB 10_1101-436634 737 27 preprint preprint NN 10_1101-436634 737 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 737 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 737 30 39 39 CD 10_1101-436634 737 31 Supplementary Supplementary NNP 10_1101-436634 737 32 Figure Figure NNP 10_1101-436634 737 33 6 6 CD 10_1101-436634 737 34 . . . 10_1101-436634 738 1 Pan pan JJ 10_1101-436634 738 2 - - JJ 10_1101-436634 738 3 cancer cancer JJ 10_1101-436634 738 4 analysis analysis NN 10_1101-436634 738 5 of of IN 10_1101-436634 738 6 cohorts cohort NNS 10_1101-436634 738 7 from from IN 10_1101-436634 738 8 TCGA TCGA NNP 10_1101-436634 738 9 and and CC 10_1101-436634 738 10 MGI MGI NNP 10_1101-436634 738 11 reveals reveal VBZ 10_1101-436634 738 12 genes gene NNS 10_1101-436634 738 13 recurrently recurrently RB 10_1101-436634 738 14 disrupted disrupt VBN 10_1101-436634 738 15 by by IN 10_1101-436634 738 16 variants variant NNS 10_1101-436634 738 17 which which WDT 10_1101-436634 738 18 cause cause VBP 10_1101-436634 738 19 non non JJ 10_1101-436634 738 20 - - JJ 10_1101-436634 738 21 canonical canonical JJ 10_1101-436634 738 22 splicing splicing NN 10_1101-436634 738 23 patterns pattern NNS 10_1101-436634 738 24 Results Results NNPS 10_1101-436634 738 25 of of IN 10_1101-436634 738 26 analysis analysis NN 10_1101-436634 738 27 for for IN 10_1101-436634 738 28 recurrently recurrently RB 10_1101-436634 738 29 disrupted disrupt VBN 10_1101-436634 738 30 genes gene NNS 10_1101-436634 738 31 in in IN 10_1101-436634 738 32 each each DT 10_1101-436634 738 33 cohort cohort NN 10_1101-436634 738 34 . . . 10_1101-436634 739 1 A a LS 10_1101-436634 739 2 ) ) -RRB- 10_1101-436634 739 3 Rows row NNS 10_1101-436634 739 4 correspond correspond VBP 10_1101-436634 739 5 to to IN 10_1101-436634 739 6 the the DT 10_1101-436634 739 7 40 40 CD 10_1101-436634 739 8 most most RBS 10_1101-436634 739 9 frequently frequently RB 10_1101-436634 739 10 recurring recur VBG 10_1101-436634 739 11 genes gene NNS 10_1101-436634 739 12 , , , 10_1101-436634 739 13 as as IN 10_1101-436634 739 14 ranked rank VBN 10_1101-436634 739 15 by by IN 10_1101-436634 739 16 binomial binomial NNP 10_1101-436634 739 17 p p NN 10_1101-436634 739 18 - - HYPH 10_1101-436634 739 19 value value NN 10_1101-436634 739 20 . . . 10_1101-436634 740 1 Genes gene NNS 10_1101-436634 740 2 are be VBP 10_1101-436634 740 3 clustered cluster VBN 10_1101-436634 740 4 by by IN 10_1101-436634 740 5 whether whether IN 10_1101-436634 740 6 they -PRON- PRP 10_1101-436634 740 7 were be VBD 10_1101-436634 740 8 annotated annotate VBN 10_1101-436634 740 9 by by IN 10_1101-436634 740 10 the the DT 10_1101-436634 740 11 CGC CGC NNP 10_1101-436634 740 12 as as IN 10_1101-436634 740 13 an an DT 10_1101-436634 740 14 oncogene oncogene NN 10_1101-436634 740 15 ( ( -LRB- 10_1101-436634 740 16 red red NNP 10_1101-436634 740 17 ) ) -RRB- 10_1101-436634 740 18 , , , 10_1101-436634 740 19 an an DT 10_1101-436634 740 20 oncogene oncogene NN 10_1101-436634 740 21 and and CC 10_1101-436634 740 22 tumor tumor NN 10_1101-436634 740 23 suppressor suppressor NN 10_1101-436634 740 24 gene gene NN 10_1101-436634 740 25 ( ( -LRB- 10_1101-436634 740 26 yellow yellow NNP 10_1101-436634 740 27 ) ) -RRB- 10_1101-436634 740 28 , , , 10_1101-436634 740 29 a a DT 10_1101-436634 740 30 tumor tumor NN 10_1101-436634 740 31 suppressor suppressor NN 10_1101-436634 740 32 gene gene NN 10_1101-436634 740 33 ( ( -LRB- 10_1101-436634 740 34 green green NNP 10_1101-436634 740 35 ) ) -RRB- 10_1101-436634 740 36 , , , 10_1101-436634 740 37 or or CC 10_1101-436634 740 38 another another DT 10_1101-436634 740 39 type type NN 10_1101-436634 740 40 of of IN 10_1101-436634 740 41 cancer cancer NN 10_1101-436634 740 42 - - HYPH 10_1101-436634 740 43 relevant relevant JJ 10_1101-436634 740 44 gene gene NN 10_1101-436634 740 45 . . . 10_1101-436634 741 1 Shading shade VBG 10_1101-436634 741 2 corresponds correspond NNS 10_1101-436634 741 3 to to IN 10_1101-436634 741 4 −log10(p −log10(p ADD 10_1101-436634 741 5 value value NN 10_1101-436634 741 6 ) ) -RRB- 10_1101-436634 741 7 and and CC 10_1101-436634 741 8 columns column NNS 10_1101-436634 741 9 represent represent VBP 10_1101-436634 741 10 cancer cancer NN 10_1101-436634 741 11 types type NNS 10_1101-436634 741 12 . . . 10_1101-436634 742 1 Red red JJ 10_1101-436634 742 2 marks mark NNS 10_1101-436634 742 3 within within IN 10_1101-436634 742 4 cells cell NNS 10_1101-436634 742 5 indicate indicate VBP 10_1101-436634 742 6 that that IN 10_1101-436634 742 7 the the DT 10_1101-436634 742 8 gene gene NN 10_1101-436634 742 9 was be VBD 10_1101-436634 742 10 annotated annotate VBN 10_1101-436634 742 11 by by IN 10_1101-436634 742 12 CHASMplus CHASMplus NNP 10_1101-436634 742 13 as as IN 10_1101-436634 742 14 a a DT 10_1101-436634 742 15 driver driver NN 10_1101-436634 742 16 within within IN 10_1101-436634 742 17 a a DT 10_1101-436634 742 18 given give VBN 10_1101-436634 742 19 TCGA tcga NN 10_1101-436634 742 20 cohort cohort NN 10_1101-436634 742 21 . . . 10_1101-436634 743 1 B b NN 10_1101-436634 743 2 ) ) -RRB- 10_1101-436634 743 3 Rows row NNS 10_1101-436634 743 4 correspond correspond VB 10_1101-436634 743 5 to to IN 10_1101-436634 743 6 the the DT 10_1101-436634 743 7 40 40 CD 10_1101-436634 743 8 most most RBS 10_1101-436634 743 9 frequently frequently RB 10_1101-436634 743 10 recurring recur VBG 10_1101-436634 743 11 genes gene NNS 10_1101-436634 743 12 , , , 10_1101-436634 743 13 as as IN 10_1101-436634 743 14 ranked rank VBN 10_1101-436634 743 15 by by IN 10_1101-436634 743 16 fraction fraction NN 10_1101-436634 743 17 of of IN 10_1101-436634 743 18 samples sample NNS 10_1101-436634 743 19 . . . 10_1101-436634 744 1 Shading shade VBG 10_1101-436634 744 2 corresponds correspond NNS 10_1101-436634 744 3 to to IN 10_1101-436634 744 4 the the DT 10_1101-436634 744 5 fraction fraction NN 10_1101-436634 744 6 of of IN 10_1101-436634 744 7 samples sample NNS 10_1101-436634 744 8 and and CC 10_1101-436634 744 9 columns column NNS 10_1101-436634 744 10 represent represent VBP 10_1101-436634 744 11 cancer cancer NN 10_1101-436634 744 12 types type NNS 10_1101-436634 744 13 . . . 10_1101-436634 745 1 Red red JJ 10_1101-436634 745 2 marks mark NNS 10_1101-436634 745 3 within within IN 10_1101-436634 745 4 cells cell NNS 10_1101-436634 745 5 indicate indicate VBP 10_1101-436634 745 6 that that IN 10_1101-436634 745 7 the the DT 10_1101-436634 745 8 gene gene NN 10_1101-436634 745 9 was be VBD 10_1101-436634 745 10 annotated annotate VBN 10_1101-436634 745 11 by by IN 10_1101-436634 745 12 CHASMplus CHASMplus NNP 10_1101-436634 745 13 as as IN 10_1101-436634 745 14 a a DT 10_1101-436634 745 15 driver driver NN 10_1101-436634 745 16 within within IN 10_1101-436634 745 17 a a DT 10_1101-436634 745 18 given give VBN 10_1101-436634 745 19 TCGA tcga NN 10_1101-436634 745 20 cohort cohort NN 10_1101-436634 745 21 . . . 10_1101-436634 746 1 .CC .CC NFP 10_1101-436634 746 2 - - : 10_1101-436634 746 3 BY by IN 10_1101-436634 746 4 - - HYPH 10_1101-436634 746 5 NC NC NNP 10_1101-436634 746 6 - - HYPH 10_1101-436634 746 7 ND ND NNP 10_1101-436634 746 8 4.0 4.0 CD 10_1101-436634 746 9 International international JJ 10_1101-436634 746 10 licensea licensea NNS 10_1101-436634 746 11 certified certify VBN 10_1101-436634 746 12 by by IN 10_1101-436634 746 13 peer peer NN 10_1101-436634 746 14 review review NN 10_1101-436634 746 15 ) ) -RRB- 10_1101-436634 746 16 is be VBZ 10_1101-436634 746 17 the the DT 10_1101-436634 746 18 author author NN 10_1101-436634 746 19 / / SYM 10_1101-436634 746 20 funder funder NN 10_1101-436634 746 21 , , , 10_1101-436634 746 22 who who WP 10_1101-436634 746 23 has have VBZ 10_1101-436634 746 24 granted grant VBN 10_1101-436634 746 25 bioRxiv biorxiv IN 10_1101-436634 746 26 a a DT 10_1101-436634 746 27 license license NN 10_1101-436634 746 28 to to TO 10_1101-436634 746 29 display display VB 10_1101-436634 746 30 the the DT 10_1101-436634 746 31 preprint preprint NN 10_1101-436634 746 32 in in IN 10_1101-436634 746 33 perpetuity perpetuity NN 10_1101-436634 746 34 . . . 10_1101-436634 747 1 It -PRON- PRP 10_1101-436634 747 2 is be VBZ 10_1101-436634 747 3 made make VBN 10_1101-436634 747 4 available available JJ 10_1101-436634 747 5 under under IN 10_1101-436634 747 6 The the DT 10_1101-436634 747 7 copyright copyright NN 10_1101-436634 747 8 holder holder NN 10_1101-436634 747 9 for for IN 10_1101-436634 747 10 this this DT 10_1101-436634 747 11 preprint preprint NN 10_1101-436634 747 12 ( ( -LRB- 10_1101-436634 747 13 which which WDT 10_1101-436634 747 14 was be VBD 10_1101-436634 747 15 notthis notthis DT 10_1101-436634 747 16 version version NN 10_1101-436634 747 17 posted post VBN 10_1101-436634 747 18 January January NNP 10_1101-436634 747 19 5 5 CD 10_1101-436634 747 20 , , , 10_1101-436634 747 21 2021 2021 CD 10_1101-436634 747 22 . . . 10_1101-436634 747 23 ; ; : 10_1101-436634 747 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 747 25 : : : 10_1101-436634 747 26 bioRxiv biorxiv VB 10_1101-436634 747 27 preprint preprint NN 10_1101-436634 747 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 747 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 747 30 40 40 CD 10_1101-436634 747 31 Supplementary Supplementary NNP 10_1101-436634 747 32 Figure Figure NNP 10_1101-436634 747 33 7 7 CD 10_1101-436634 747 34 . . . 10_1101-436634 748 1 Pan pan JJ 10_1101-436634 748 2 - - JJ 10_1101-436634 748 3 cancer cancer JJ 10_1101-436634 748 4 analysis analysis NN 10_1101-436634 748 5 of of IN 10_1101-436634 748 6 cohorts cohort NNS 10_1101-436634 748 7 from from IN 10_1101-436634 748 8 TCGA TCGA NNP 10_1101-436634 748 9 and and CC 10_1101-436634 748 10 MGI MGI NNP 10_1101-436634 748 11 reveals reveal VBZ 10_1101-436634 748 12 genes gene NNS 10_1101-436634 748 13 recurrently recurrently RB 10_1101-436634 748 14 disrupted disrupt VBN 10_1101-436634 748 15 by by IN 10_1101-436634 748 16 variants variant NNS 10_1101-436634 748 17 which which WDT 10_1101-436634 748 18 promote promote VBP 10_1101-436634 748 19 splicing splicing NN 10_1101-436634 748 20 of of IN 10_1101-436634 748 21 particular particular JJ 10_1101-436634 748 22 canonical canonical JJ 10_1101-436634 748 23 junctions junction NNS 10_1101-436634 748 24 Results Results NNPS 10_1101-436634 748 25 of of IN 10_1101-436634 748 26 analysis analysis NN 10_1101-436634 748 27 for for IN 10_1101-436634 748 28 recurrently recurrently RB 10_1101-436634 748 29 disrupted disrupt VBN 10_1101-436634 748 30 genes gene NNS 10_1101-436634 748 31 in in IN 10_1101-436634 748 32 each each DT 10_1101-436634 748 33 cohort cohort NN 10_1101-436634 748 34 . . . 10_1101-436634 749 1 A a LS 10_1101-436634 749 2 ) ) -RRB- 10_1101-436634 749 3 Rows row NNS 10_1101-436634 749 4 correspond correspond VBP 10_1101-436634 749 5 to to IN 10_1101-436634 749 6 the the DT 10_1101-436634 749 7 40 40 CD 10_1101-436634 749 8 most most RBS 10_1101-436634 749 9 frequently frequently RB 10_1101-436634 749 10 recurring recur VBG 10_1101-436634 749 11 genes gene NNS 10_1101-436634 749 12 , , , 10_1101-436634 749 13 as as IN 10_1101-436634 749 14 ranked rank VBN 10_1101-436634 749 15 by by IN 10_1101-436634 749 16 binomial binomial NNP 10_1101-436634 749 17 p p NN 10_1101-436634 749 18 - - HYPH 10_1101-436634 749 19 value value NN 10_1101-436634 749 20 . . . 10_1101-436634 750 1 Genes gene NNS 10_1101-436634 750 2 are be VBP 10_1101-436634 750 3 clustered cluster VBN 10_1101-436634 750 4 by by IN 10_1101-436634 750 5 whether whether IN 10_1101-436634 750 6 they -PRON- PRP 10_1101-436634 750 7 were be VBD 10_1101-436634 750 8 annotated annotate VBN 10_1101-436634 750 9 by by IN 10_1101-436634 750 10 the the DT 10_1101-436634 750 11 CGC CGC NNP 10_1101-436634 750 12 as as IN 10_1101-436634 750 13 an an DT 10_1101-436634 750 14 oncogene oncogene NN 10_1101-436634 750 15 ( ( -LRB- 10_1101-436634 750 16 red red NNP 10_1101-436634 750 17 ) ) -RRB- 10_1101-436634 750 18 , , , 10_1101-436634 750 19 an an DT 10_1101-436634 750 20 oncogene oncogene NN 10_1101-436634 750 21 and and CC 10_1101-436634 750 22 tumor tumor NN 10_1101-436634 750 23 suppressor suppressor NN 10_1101-436634 750 24 gene gene NN 10_1101-436634 750 25 ( ( -LRB- 10_1101-436634 750 26 yellow yellow NNP 10_1101-436634 750 27 ) ) -RRB- 10_1101-436634 750 28 , , , 10_1101-436634 750 29 a a DT 10_1101-436634 750 30 tumor tumor NN 10_1101-436634 750 31 suppressor suppressor NN 10_1101-436634 750 32 gene gene NN 10_1101-436634 750 33 ( ( -LRB- 10_1101-436634 750 34 green green NNP 10_1101-436634 750 35 ) ) -RRB- 10_1101-436634 750 36 , , , 10_1101-436634 750 37 or or CC 10_1101-436634 750 38 another another DT 10_1101-436634 750 39 type type NN 10_1101-436634 750 40 of of IN 10_1101-436634 750 41 cancer cancer NN 10_1101-436634 750 42 - - HYPH 10_1101-436634 750 43 relevant relevant JJ 10_1101-436634 750 44 gene gene NN 10_1101-436634 750 45 . . . 10_1101-436634 751 1 Shading shade VBG 10_1101-436634 751 2 corresponds correspond NNS 10_1101-436634 751 3 to to IN 10_1101-436634 751 4 −log10(p −log10(p ADD 10_1101-436634 751 5 value value NN 10_1101-436634 751 6 ) ) -RRB- 10_1101-436634 751 7 and and CC 10_1101-436634 751 8 columns column NNS 10_1101-436634 751 9 represent represent VBP 10_1101-436634 751 10 cancer cancer NN 10_1101-436634 751 11 types type NNS 10_1101-436634 751 12 . . . 10_1101-436634 752 1 Red red JJ 10_1101-436634 752 2 marks mark NNS 10_1101-436634 752 3 within within IN 10_1101-436634 752 4 cells cell NNS 10_1101-436634 752 5 indicate indicate VBP 10_1101-436634 752 6 that that IN 10_1101-436634 752 7 the the DT 10_1101-436634 752 8 gene gene NN 10_1101-436634 752 9 was be VBD 10_1101-436634 752 10 annotated annotate VBN 10_1101-436634 752 11 by by IN 10_1101-436634 752 12 CHASMplus CHASMplus NNP 10_1101-436634 752 13 as as IN 10_1101-436634 752 14 a a DT 10_1101-436634 752 15 driver driver NN 10_1101-436634 752 16 within within IN 10_1101-436634 752 17 a a DT 10_1101-436634 752 18 given give VBN 10_1101-436634 752 19 TCGA tcga NN 10_1101-436634 752 20 cohort cohort NN 10_1101-436634 752 21 . . . 10_1101-436634 753 1 B b NN 10_1101-436634 753 2 ) ) -RRB- 10_1101-436634 753 3 Rows row NNS 10_1101-436634 753 4 correspond correspond VB 10_1101-436634 753 5 to to IN 10_1101-436634 753 6 the the DT 10_1101-436634 753 7 40 40 CD 10_1101-436634 753 8 most most RBS 10_1101-436634 753 9 frequently frequently RB 10_1101-436634 753 10 recurring recur VBG 10_1101-436634 753 11 genes gene NNS 10_1101-436634 753 12 , , , 10_1101-436634 753 13 as as IN 10_1101-436634 753 14 ranked rank VBN 10_1101-436634 753 15 by by IN 10_1101-436634 753 16 fraction fraction NN 10_1101-436634 753 17 of of IN 10_1101-436634 753 18 samples sample NNS 10_1101-436634 753 19 . . . 10_1101-436634 754 1 Shading shade VBG 10_1101-436634 754 2 corresponds correspond NNS 10_1101-436634 754 3 to to IN 10_1101-436634 754 4 the the DT 10_1101-436634 754 5 fraction fraction NN 10_1101-436634 754 6 of of IN 10_1101-436634 754 7 samples sample NNS 10_1101-436634 754 8 and and CC 10_1101-436634 754 9 columns column NNS 10_1101-436634 754 10 represent represent VBP 10_1101-436634 754 11 cancer cancer NN 10_1101-436634 754 12 types type NNS 10_1101-436634 754 13 . . . 10_1101-436634 755 1 Red red JJ 10_1101-436634 755 2 marks mark NNS 10_1101-436634 755 3 within within IN 10_1101-436634 755 4 cells cell NNS 10_1101-436634 755 5 indicate indicate VBP 10_1101-436634 755 6 that that IN 10_1101-436634 755 7 the the DT 10_1101-436634 755 8 gene gene NN 10_1101-436634 755 9 was be VBD 10_1101-436634 755 10 annotated annotate VBN 10_1101-436634 755 11 by by IN 10_1101-436634 755 12 CHASMplus CHASMplus NNP 10_1101-436634 755 13 as as IN 10_1101-436634 755 14 a a DT 10_1101-436634 755 15 driver driver NN 10_1101-436634 755 16 within within IN 10_1101-436634 755 17 a a DT 10_1101-436634 755 18 given give VBN 10_1101-436634 755 19 TCGA tcga NN 10_1101-436634 755 20 cohort cohort NN 10_1101-436634 755 21 . . . 10_1101-436634 756 1 .CC .CC NFP 10_1101-436634 756 2 - - : 10_1101-436634 756 3 BY by IN 10_1101-436634 756 4 - - HYPH 10_1101-436634 756 5 NC NC NNP 10_1101-436634 756 6 - - HYPH 10_1101-436634 756 7 ND ND NNP 10_1101-436634 756 8 4.0 4.0 CD 10_1101-436634 756 9 International international JJ 10_1101-436634 756 10 licensea licensea NNS 10_1101-436634 756 11 certified certify VBN 10_1101-436634 756 12 by by IN 10_1101-436634 756 13 peer peer NN 10_1101-436634 756 14 review review NN 10_1101-436634 756 15 ) ) -RRB- 10_1101-436634 756 16 is be VBZ 10_1101-436634 756 17 the the DT 10_1101-436634 756 18 author author NN 10_1101-436634 756 19 / / SYM 10_1101-436634 756 20 funder funder NN 10_1101-436634 756 21 , , , 10_1101-436634 756 22 who who WP 10_1101-436634 756 23 has have VBZ 10_1101-436634 756 24 granted grant VBN 10_1101-436634 756 25 bioRxiv biorxiv IN 10_1101-436634 756 26 a a DT 10_1101-436634 756 27 license license NN 10_1101-436634 756 28 to to TO 10_1101-436634 756 29 display display VB 10_1101-436634 756 30 the the DT 10_1101-436634 756 31 preprint preprint NN 10_1101-436634 756 32 in in IN 10_1101-436634 756 33 perpetuity perpetuity NN 10_1101-436634 756 34 . . . 10_1101-436634 757 1 It -PRON- PRP 10_1101-436634 757 2 is be VBZ 10_1101-436634 757 3 made make VBN 10_1101-436634 757 4 available available JJ 10_1101-436634 757 5 under under IN 10_1101-436634 757 6 The the DT 10_1101-436634 757 7 copyright copyright NN 10_1101-436634 757 8 holder holder NN 10_1101-436634 757 9 for for IN 10_1101-436634 757 10 this this DT 10_1101-436634 757 11 preprint preprint NN 10_1101-436634 757 12 ( ( -LRB- 10_1101-436634 757 13 which which WDT 10_1101-436634 757 14 was be VBD 10_1101-436634 757 15 notthis notthis DT 10_1101-436634 757 16 version version NN 10_1101-436634 757 17 posted post VBN 10_1101-436634 757 18 January January NNP 10_1101-436634 757 19 5 5 CD 10_1101-436634 757 20 , , , 10_1101-436634 757 21 2021 2021 CD 10_1101-436634 757 22 . . . 10_1101-436634 757 23 ; ; : 10_1101-436634 757 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 757 25 : : : 10_1101-436634 757 26 bioRxiv biorxiv VB 10_1101-436634 757 27 preprint preprint NN 10_1101-436634 757 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 757 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 757 30 41 41 CD 10_1101-436634 757 31 Supplementary Supplementary NNP 10_1101-436634 757 32 Figure Figure NNP 10_1101-436634 757 33 8 8 CD 10_1101-436634 757 34 . . . 10_1101-436634 758 1 TCGA tcga JJ 10_1101-436634 758 2 pan pan NN 10_1101-436634 758 3 - - HYPH 10_1101-436634 758 4 cancer cancer NN 10_1101-436634 758 5 analysis analysis NN 10_1101-436634 758 6 reveals reveal VBZ 10_1101-436634 758 7 genes gene NNS 10_1101-436634 758 8 recurrently recurrently RB 10_1101-436634 758 9 disrupted disrupt VBN 10_1101-436634 758 10 by by IN 10_1101-436634 758 11 variants variant NNS 10_1101-436634 758 12 which which WDT 10_1101-436634 758 13 cause cause VBP 10_1101-436634 758 14 non non JJ 10_1101-436634 758 15 - - JJ 10_1101-436634 758 16 canonical canonical JJ 10_1101-436634 758 17 splicing splicing NN 10_1101-436634 758 18 patterns pattern NNS 10_1101-436634 758 19 Results Results NNPS 10_1101-436634 758 20 of of IN 10_1101-436634 758 21 analysis analysis NN 10_1101-436634 758 22 for for IN 10_1101-436634 758 23 recurrently recurrently RB 10_1101-436634 758 24 disrupted disrupt VBN 10_1101-436634 758 25 genes gene NNS 10_1101-436634 758 26 in in IN 10_1101-436634 758 27 each each DT 10_1101-436634 758 28 TCGA TCGA NNP 10_1101-436634 758 29 cohort cohort NN 10_1101-436634 758 30 . . . 10_1101-436634 759 1 A a LS 10_1101-436634 759 2 ) ) -RRB- 10_1101-436634 759 3 Rows row NNS 10_1101-436634 759 4 correspond correspond VBP 10_1101-436634 759 5 to to IN 10_1101-436634 759 6 the the DT 10_1101-436634 759 7 40 40 CD 10_1101-436634 759 8 most most RBS 10_1101-436634 759 9 frequently frequently RB 10_1101-436634 759 10 recurring recur VBG 10_1101-436634 759 11 genes gene NNS 10_1101-436634 759 12 , , , 10_1101-436634 759 13 as as IN 10_1101-436634 759 14 ranked rank VBN 10_1101-436634 759 15 by by IN 10_1101-436634 759 16 binomial binomial NNP 10_1101-436634 759 17 p p NN 10_1101-436634 759 18 - - HYPH 10_1101-436634 759 19 value value NN 10_1101-436634 759 20 . . . 10_1101-436634 760 1 Genes gene NNS 10_1101-436634 760 2 are be VBP 10_1101-436634 760 3 clustered cluster VBN 10_1101-436634 760 4 by by IN 10_1101-436634 760 5 whether whether IN 10_1101-436634 760 6 they -PRON- PRP 10_1101-436634 760 7 were be VBD 10_1101-436634 760 8 annotated annotate VBN 10_1101-436634 760 9 by by IN 10_1101-436634 760 10 the the DT 10_1101-436634 760 11 CGC CGC NNP 10_1101-436634 760 12 as as IN 10_1101-436634 760 13 an an DT 10_1101-436634 760 14 oncogene oncogene NN 10_1101-436634 760 15 ( ( -LRB- 10_1101-436634 760 16 red red NNP 10_1101-436634 760 17 ) ) -RRB- 10_1101-436634 760 18 , , , 10_1101-436634 760 19 an an DT 10_1101-436634 760 20 oncogene oncogene NN 10_1101-436634 760 21 and and CC 10_1101-436634 760 22 tumor tumor NN 10_1101-436634 760 23 suppressor suppressor NN 10_1101-436634 760 24 gene gene NN 10_1101-436634 760 25 ( ( -LRB- 10_1101-436634 760 26 yellow yellow NNP 10_1101-436634 760 27 ) ) -RRB- 10_1101-436634 760 28 , , , 10_1101-436634 760 29 a a DT 10_1101-436634 760 30 tumor tumor NN 10_1101-436634 760 31 suppressor suppressor NN 10_1101-436634 760 32 gene gene NN 10_1101-436634 760 33 ( ( -LRB- 10_1101-436634 760 34 green green NNP 10_1101-436634 760 35 ) ) -RRB- 10_1101-436634 760 36 , , , 10_1101-436634 760 37 or or CC 10_1101-436634 760 38 another another DT 10_1101-436634 760 39 type type NN 10_1101-436634 760 40 of of IN 10_1101-436634 760 41 cancer cancer NN 10_1101-436634 760 42 - - HYPH 10_1101-436634 760 43 relevant relevant JJ 10_1101-436634 760 44 gene gene NN 10_1101-436634 760 45 . . . 10_1101-436634 761 1 Shading shade VBG 10_1101-436634 761 2 corresponds correspond NNS 10_1101-436634 761 3 to to IN 10_1101-436634 761 4 −log10(p −log10(p ADD 10_1101-436634 761 5 value value NN 10_1101-436634 761 6 ) ) -RRB- 10_1101-436634 761 7 and and CC 10_1101-436634 761 8 columns column NNS 10_1101-436634 761 9 represent represent VBP 10_1101-436634 761 10 cancer cancer NN 10_1101-436634 761 11 types type NNS 10_1101-436634 761 12 . . . 10_1101-436634 762 1 Red red JJ 10_1101-436634 762 2 marks mark NNS 10_1101-436634 762 3 within within IN 10_1101-436634 762 4 cells cell NNS 10_1101-436634 762 5 indicate indicate VBP 10_1101-436634 762 6 that that IN 10_1101-436634 762 7 the the DT 10_1101-436634 762 8 gene gene NN 10_1101-436634 762 9 was be VBD 10_1101-436634 762 10 annotated annotate VBN 10_1101-436634 762 11 by by IN 10_1101-436634 762 12 CHASMplus CHASMplus NNP 10_1101-436634 762 13 as as IN 10_1101-436634 762 14 a a DT 10_1101-436634 762 15 driver driver NN 10_1101-436634 762 16 within within IN 10_1101-436634 762 17 a a DT 10_1101-436634 762 18 given give VBN 10_1101-436634 762 19 TCGA tcga NN 10_1101-436634 762 20 cohort cohort NN 10_1101-436634 762 21 . . . 10_1101-436634 763 1 B b NN 10_1101-436634 763 2 ) ) -RRB- 10_1101-436634 763 3 Rows row NNS 10_1101-436634 763 4 correspond correspond VB 10_1101-436634 763 5 to to IN 10_1101-436634 763 6 the the DT 10_1101-436634 763 7 40 40 CD 10_1101-436634 763 8 most most RBS 10_1101-436634 763 9 frequently frequently RB 10_1101-436634 763 10 recurring recur VBG 10_1101-436634 763 11 genes gene NNS 10_1101-436634 763 12 , , , 10_1101-436634 763 13 as as IN 10_1101-436634 763 14 ranked rank VBN 10_1101-436634 763 15 by by IN 10_1101-436634 763 16 fraction fraction NN 10_1101-436634 763 17 of of IN 10_1101-436634 763 18 samples sample NNS 10_1101-436634 763 19 . . . 10_1101-436634 764 1 .CC .CC NFP 10_1101-436634 764 2 - - : 10_1101-436634 764 3 BY by IN 10_1101-436634 764 4 - - HYPH 10_1101-436634 764 5 NC NC NNP 10_1101-436634 764 6 - - HYPH 10_1101-436634 764 7 ND ND NNP 10_1101-436634 764 8 4.0 4.0 CD 10_1101-436634 764 9 International international JJ 10_1101-436634 764 10 licensea licensea NNS 10_1101-436634 764 11 certified certify VBN 10_1101-436634 764 12 by by IN 10_1101-436634 764 13 peer peer NN 10_1101-436634 764 14 review review NN 10_1101-436634 764 15 ) ) -RRB- 10_1101-436634 764 16 is be VBZ 10_1101-436634 764 17 the the DT 10_1101-436634 764 18 author author NN 10_1101-436634 764 19 / / SYM 10_1101-436634 764 20 funder funder NN 10_1101-436634 764 21 , , , 10_1101-436634 764 22 who who WP 10_1101-436634 764 23 has have VBZ 10_1101-436634 764 24 granted grant VBN 10_1101-436634 764 25 bioRxiv biorxiv IN 10_1101-436634 764 26 a a DT 10_1101-436634 764 27 license license NN 10_1101-436634 764 28 to to TO 10_1101-436634 764 29 display display VB 10_1101-436634 764 30 the the DT 10_1101-436634 764 31 preprint preprint NN 10_1101-436634 764 32 in in IN 10_1101-436634 764 33 perpetuity perpetuity NN 10_1101-436634 764 34 . . . 10_1101-436634 765 1 It -PRON- PRP 10_1101-436634 765 2 is be VBZ 10_1101-436634 765 3 made make VBN 10_1101-436634 765 4 available available JJ 10_1101-436634 765 5 under under IN 10_1101-436634 765 6 The the DT 10_1101-436634 765 7 copyright copyright NN 10_1101-436634 765 8 holder holder NN 10_1101-436634 765 9 for for IN 10_1101-436634 765 10 this this DT 10_1101-436634 765 11 preprint preprint NN 10_1101-436634 765 12 ( ( -LRB- 10_1101-436634 765 13 which which WDT 10_1101-436634 765 14 was be VBD 10_1101-436634 765 15 notthis notthis DT 10_1101-436634 765 16 version version NN 10_1101-436634 765 17 posted post VBN 10_1101-436634 765 18 January January NNP 10_1101-436634 765 19 5 5 CD 10_1101-436634 765 20 , , , 10_1101-436634 765 21 2021 2021 CD 10_1101-436634 765 22 . . . 10_1101-436634 765 23 ; ; : 10_1101-436634 765 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 765 25 : : : 10_1101-436634 765 26 bioRxiv biorxiv VB 10_1101-436634 765 27 preprint preprint NN 10_1101-436634 765 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 765 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB 10_1101-436634 765 30 42 42 CD 10_1101-436634 765 31 Shading Shading NNP 10_1101-436634 765 32 corresponds correspond NNS 10_1101-436634 765 33 to to IN 10_1101-436634 765 34 the the DT 10_1101-436634 765 35 fraction fraction NN 10_1101-436634 765 36 of of IN 10_1101-436634 765 37 samples sample NNS 10_1101-436634 765 38 and and CC 10_1101-436634 765 39 columns column NNS 10_1101-436634 765 40 represent represent VBP 10_1101-436634 765 41 cancer cancer NN 10_1101-436634 765 42 types type NNS 10_1101-436634 765 43 . . . 10_1101-436634 766 1 Red red JJ 10_1101-436634 766 2 marks mark NNS 10_1101-436634 766 3 within within IN 10_1101-436634 766 4 cells cell NNS 10_1101-436634 766 5 indicate indicate VBP 10_1101-436634 766 6 that that IN 10_1101-436634 766 7 the the DT 10_1101-436634 766 8 gene gene NN 10_1101-436634 766 9 was be VBD 10_1101-436634 766 10 annotated annotate VBN 10_1101-436634 766 11 by by IN 10_1101-436634 766 12 CHASMplus CHASMplus NNP 10_1101-436634 766 13 as as IN 10_1101-436634 766 14 a a DT 10_1101-436634 766 15 driver driver NN 10_1101-436634 766 16 within within IN 10_1101-436634 766 17 a a DT 10_1101-436634 766 18 given give VBN 10_1101-436634 766 19 TCGA tcga NN 10_1101-436634 766 20 cohort cohort NN 10_1101-436634 766 21 . . . 10_1101-436634 767 1 Supplementary Supplementary NNP 10_1101-436634 767 2 Figure Figure NNP 10_1101-436634 767 3 9 9 CD 10_1101-436634 767 4 . . . 10_1101-436634 768 1 TCGA tcga JJ 10_1101-436634 768 2 pan pan NN 10_1101-436634 768 3 - - HYPH 10_1101-436634 768 4 cancer cancer NN 10_1101-436634 768 5 analysis analysis NN 10_1101-436634 768 6 reveals reveal VBZ 10_1101-436634 768 7 genes gene NNS 10_1101-436634 768 8 recurrently recurrently RB 10_1101-436634 768 9 disrupted disrupt VBN 10_1101-436634 768 10 by by IN 10_1101-436634 768 11 variants variant NNS 10_1101-436634 768 12 which which WDT 10_1101-436634 768 13 promote promote VBP 10_1101-436634 768 14 splicing splicing NN 10_1101-436634 768 15 of of IN 10_1101-436634 768 16 particular particular JJ 10_1101-436634 768 17 canonical canonical JJ 10_1101-436634 768 18 junctions junction NNS 10_1101-436634 768 19 Results Results NNPS 10_1101-436634 768 20 of of IN 10_1101-436634 768 21 analysis analysis NN 10_1101-436634 768 22 for for IN 10_1101-436634 768 23 recurrently recurrently RB 10_1101-436634 768 24 disrupted disrupt VBN 10_1101-436634 768 25 genes gene NNS 10_1101-436634 768 26 in in IN 10_1101-436634 768 27 each each DT 10_1101-436634 768 28 TCGA TCGA NNP 10_1101-436634 768 29 cohort cohort NN 10_1101-436634 768 30 . . . 10_1101-436634 769 1 A a LS 10_1101-436634 769 2 ) ) -RRB- 10_1101-436634 769 3 Rows row NNS 10_1101-436634 769 4 correspond correspond VBP 10_1101-436634 769 5 to to IN 10_1101-436634 769 6 the the DT 10_1101-436634 769 7 40 40 CD 10_1101-436634 769 8 most most RBS 10_1101-436634 769 9 frequently frequently RB 10_1101-436634 769 10 recurring recur VBG 10_1101-436634 769 11 genes gene NNS 10_1101-436634 769 12 , , , 10_1101-436634 769 13 as as IN 10_1101-436634 769 14 ranked rank VBN 10_1101-436634 769 15 by by IN 10_1101-436634 769 16 binomial binomial NNP 10_1101-436634 769 17 p p NN 10_1101-436634 769 18 - - HYPH 10_1101-436634 769 19 value value NN 10_1101-436634 769 20 . . . 10_1101-436634 770 1 Genes gene NNS 10_1101-436634 770 2 are be VBP 10_1101-436634 770 3 clustered cluster VBN 10_1101-436634 770 4 by by IN 10_1101-436634 770 5 whether whether IN 10_1101-436634 770 6 they -PRON- PRP 10_1101-436634 770 7 were be VBD 10_1101-436634 770 8 annotated annotate VBN 10_1101-436634 770 9 by by IN 10_1101-436634 770 10 the the DT 10_1101-436634 770 11 CGC CGC NNP 10_1101-436634 770 12 as as IN 10_1101-436634 770 13 an an DT 10_1101-436634 770 14 oncogene oncogene NN 10_1101-436634 770 15 ( ( -LRB- 10_1101-436634 770 16 red red NNP 10_1101-436634 770 17 ) ) -RRB- 10_1101-436634 770 18 , , , 10_1101-436634 770 19 an an DT 10_1101-436634 770 20 oncogene oncogene NN 10_1101-436634 770 21 and and CC 10_1101-436634 770 22 tumor tumor NN 10_1101-436634 770 23 suppressor suppressor NN 10_1101-436634 770 24 gene gene NN 10_1101-436634 770 25 ( ( -LRB- 10_1101-436634 770 26 yellow yellow NNP 10_1101-436634 770 27 ) ) -RRB- 10_1101-436634 770 28 , , , 10_1101-436634 770 29 a a DT 10_1101-436634 770 30 tumor tumor NN 10_1101-436634 770 31 suppressor suppressor NN 10_1101-436634 770 32 gene gene NN 10_1101-436634 770 33 ( ( -LRB- 10_1101-436634 770 34 green green NNP 10_1101-436634 770 35 ) ) -RRB- 10_1101-436634 770 36 , , , 10_1101-436634 770 37 or or CC 10_1101-436634 770 38 another another DT 10_1101-436634 770 39 type type NN 10_1101-436634 770 40 of of IN 10_1101-436634 770 41 cancer cancer NN 10_1101-436634 770 42 - - HYPH 10_1101-436634 770 43 relevant relevant JJ 10_1101-436634 770 44 gene gene NN 10_1101-436634 770 45 . . . 10_1101-436634 771 1 Shading shade VBG 10_1101-436634 771 2 corresponds correspond NNS 10_1101-436634 771 3 to to IN 10_1101-436634 771 4 −log10(p −log10(p ADD 10_1101-436634 771 5 value value NN 10_1101-436634 771 6 ) ) -RRB- 10_1101-436634 771 7 and and CC 10_1101-436634 771 8 columns column NNS 10_1101-436634 771 9 represent represent VBP 10_1101-436634 771 10 cancer cancer NN 10_1101-436634 771 11 types type NNS 10_1101-436634 771 12 . . . 10_1101-436634 772 1 Red red JJ 10_1101-436634 772 2 marks mark NNS 10_1101-436634 772 3 within within IN 10_1101-436634 772 4 cells cell NNS 10_1101-436634 772 5 indicate indicate VBP 10_1101-436634 772 6 that that IN 10_1101-436634 772 7 the the DT 10_1101-436634 772 8 .CC .CC NFP 10_1101-436634 772 9 - - HYPH 10_1101-436634 772 10 BY by IN 10_1101-436634 772 11 - - HYPH 10_1101-436634 772 12 NC NC NNP 10_1101-436634 772 13 - - HYPH 10_1101-436634 772 14 ND ND NNP 10_1101-436634 772 15 4.0 4.0 CD 10_1101-436634 772 16 International international JJ 10_1101-436634 772 17 licensea licensea NNS 10_1101-436634 772 18 certified certify VBN 10_1101-436634 772 19 by by IN 10_1101-436634 772 20 peer peer NN 10_1101-436634 772 21 review review NN 10_1101-436634 772 22 ) ) -RRB- 10_1101-436634 772 23 is be VBZ 10_1101-436634 772 24 the the DT 10_1101-436634 772 25 author author NN 10_1101-436634 772 26 / / SYM 10_1101-436634 772 27 funder funder NN 10_1101-436634 772 28 , , , 10_1101-436634 772 29 who who WP 10_1101-436634 772 30 has have VBZ 10_1101-436634 772 31 granted grant VBN 10_1101-436634 772 32 bioRxiv biorxiv IN 10_1101-436634 772 33 a a DT 10_1101-436634 772 34 license license NN 10_1101-436634 772 35 to to TO 10_1101-436634 772 36 display display VB 10_1101-436634 772 37 the the DT 10_1101-436634 772 38 preprint preprint NN 10_1101-436634 772 39 in in IN 10_1101-436634 772 40 perpetuity perpetuity NN 10_1101-436634 772 41 . . . 10_1101-436634 773 1 It -PRON- PRP 10_1101-436634 773 2 is be VBZ 10_1101-436634 773 3 made make VBN 10_1101-436634 773 4 available available JJ 10_1101-436634 773 5 under under IN 10_1101-436634 773 6 The the DT 10_1101-436634 773 7 copyright copyright NN 10_1101-436634 773 8 holder holder NN 10_1101-436634 773 9 for for IN 10_1101-436634 773 10 this this DT 10_1101-436634 773 11 preprint preprint NN 10_1101-436634 773 12 ( ( -LRB- 10_1101-436634 773 13 which which WDT 10_1101-436634 773 14 was be VBD 10_1101-436634 773 15 notthis notthis DT 10_1101-436634 773 16 version version NN 10_1101-436634 773 17 posted post VBN 10_1101-436634 773 18 January January NNP 10_1101-436634 773 19 5 5 CD 10_1101-436634 773 20 , , , 10_1101-436634 773 21 2021 2021 CD 10_1101-436634 773 22 . . . 10_1101-436634 773 23 ; ; : 10_1101-436634 773 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 773 25 : : : 10_1101-436634 773 26 bioRxiv biorxiv VB 10_1101-436634 773 27 preprint preprint NN 10_1101-436634 773 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 773 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 773 30 43 43 CD 10_1101-436634 773 31 gene gene NN 10_1101-436634 773 32 was be VBD 10_1101-436634 773 33 annotated annotate VBN 10_1101-436634 773 34 by by IN 10_1101-436634 773 35 CHASMplus CHASMplus NNP 10_1101-436634 773 36 as as IN 10_1101-436634 773 37 a a DT 10_1101-436634 773 38 driver driver NN 10_1101-436634 773 39 within within IN 10_1101-436634 773 40 a a DT 10_1101-436634 773 41 given give VBN 10_1101-436634 773 42 TCGA tcga NN 10_1101-436634 773 43 cohort cohort NN 10_1101-436634 773 44 . . . 10_1101-436634 774 1 B b NN 10_1101-436634 774 2 ) ) -RRB- 10_1101-436634 774 3 Rows row NNS 10_1101-436634 774 4 correspond correspond VB 10_1101-436634 774 5 to to IN 10_1101-436634 774 6 the the DT 10_1101-436634 774 7 40 40 CD 10_1101-436634 774 8 most most RBS 10_1101-436634 774 9 frequently frequently RB 10_1101-436634 774 10 recurring recur VBG 10_1101-436634 774 11 genes gene NNS 10_1101-436634 774 12 , , , 10_1101-436634 774 13 as as IN 10_1101-436634 774 14 ranked rank VBN 10_1101-436634 774 15 by by IN 10_1101-436634 774 16 fraction fraction NN 10_1101-436634 774 17 of of IN 10_1101-436634 774 18 samples sample NNS 10_1101-436634 774 19 . . . 10_1101-436634 775 1 Shading shade VBG 10_1101-436634 775 2 corresponds correspond NNS 10_1101-436634 775 3 to to IN 10_1101-436634 775 4 the the DT 10_1101-436634 775 5 fraction fraction NN 10_1101-436634 775 6 of of IN 10_1101-436634 775 7 samples sample NNS 10_1101-436634 775 8 and and CC 10_1101-436634 775 9 columns column NNS 10_1101-436634 775 10 represent represent VBP 10_1101-436634 775 11 cancer cancer NN 10_1101-436634 775 12 types type NNS 10_1101-436634 775 13 . . . 10_1101-436634 776 1 Red red JJ 10_1101-436634 776 2 marks mark NNS 10_1101-436634 776 3 within within IN 10_1101-436634 776 4 cells cell NNS 10_1101-436634 776 5 indicate indicate VBP 10_1101-436634 776 6 that that IN 10_1101-436634 776 7 the the DT 10_1101-436634 776 8 gene gene NN 10_1101-436634 776 9 was be VBD 10_1101-436634 776 10 annotated annotate VBN 10_1101-436634 776 11 by by IN 10_1101-436634 776 12 CHASMplus CHASMplus NNP 10_1101-436634 776 13 as as IN 10_1101-436634 776 14 a a DT 10_1101-436634 776 15 driver driver NN 10_1101-436634 776 16 within within IN 10_1101-436634 776 17 a a DT 10_1101-436634 776 18 given give VBN 10_1101-436634 776 19 TCGA tcga NN 10_1101-436634 776 20 cohort cohort NN 10_1101-436634 776 21 . . . 10_1101-436634 777 1 Supplementary Supplementary NNP 10_1101-436634 777 2 Figure Figure NNP 10_1101-436634 777 3 10 10 CD 10_1101-436634 777 4 . . . 10_1101-436634 778 1 Analysis Analysis NNP 10_1101-436634 778 2 of of IN 10_1101-436634 778 3 HCC HCC NNP 10_1101-436634 778 4 , , , 10_1101-436634 778 5 OSCC OSCC NNP 10_1101-436634 778 6 , , , 10_1101-436634 778 7 and and CC 10_1101-436634 778 8 SCLC SCLC NNP 10_1101-436634 778 9 cohorts cohort NNS 10_1101-436634 778 10 reveals reveal VBZ 10_1101-436634 778 11 genes gene NNS 10_1101-436634 778 12 recurrently recurrently RB 10_1101-436634 778 13 disrupted disrupt VBN 10_1101-436634 778 14 by by IN 10_1101-436634 778 15 variants variant NNS 10_1101-436634 778 16 which which WDT 10_1101-436634 778 17 cause cause VBP 10_1101-436634 778 18 non non JJ 10_1101-436634 778 19 - - JJ 10_1101-436634 778 20 canonical canonical JJ 10_1101-436634 778 21 splicing splicing NN 10_1101-436634 778 22 patterns pattern NNS 10_1101-436634 778 23 Results Results NNPS 10_1101-436634 778 24 of of IN 10_1101-436634 778 25 analysis analysis NN 10_1101-436634 778 26 for for IN 10_1101-436634 778 27 recurrently recurrently RB 10_1101-436634 778 28 disrupted disrupt VBN 10_1101-436634 778 29 genes gene NNS 10_1101-436634 778 30 in in IN 10_1101-436634 778 31 each each DT 10_1101-436634 778 32 MGI MGI NNP 10_1101-436634 778 33 cohort cohort NN 10_1101-436634 778 34 . . . 10_1101-436634 779 1 A a LS 10_1101-436634 779 2 ) ) -RRB- 10_1101-436634 779 3 Rows row NNS 10_1101-436634 779 4 correspond correspond VB 10_1101-436634 779 5 to to IN 10_1101-436634 779 6 the the DT 10_1101-436634 779 7 3 3 CD 10_1101-436634 779 8 most most RBS 10_1101-436634 779 9 frequently frequently RB 10_1101-436634 779 10 recurring recur VBG 10_1101-436634 779 11 genes gene NNS 10_1101-436634 779 12 , , , 10_1101-436634 779 13 as as IN 10_1101-436634 779 14 ranked rank VBN 10_1101-436634 779 15 by by IN 10_1101-436634 779 16 binomial binomial NNP 10_1101-436634 779 17 p p NN 10_1101-436634 779 18 - - HYPH 10_1101-436634 779 19 value value NN 10_1101-436634 779 20 . . . 10_1101-436634 780 1 Shading shade VBG 10_1101-436634 780 2 corresponds correspond NNS 10_1101-436634 780 3 to to IN 10_1101-436634 780 4 −log10(p −log10(p ADD 10_1101-436634 780 5 value value NN 10_1101-436634 780 6 ) ) -RRB- 10_1101-436634 780 7 and and CC 10_1101-436634 780 8 columns column NNS 10_1101-436634 780 9 represent represent VBP 10_1101-436634 780 10 cancer cancer NN 10_1101-436634 780 11 types type NNS 10_1101-436634 780 12 . . . 10_1101-436634 781 1 B b NN 10_1101-436634 781 2 ) ) -RRB- 10_1101-436634 781 3 Rows row NNS 10_1101-436634 781 4 correspond correspond VB 10_1101-436634 781 5 to to IN 10_1101-436634 781 6 the the DT 10_1101-436634 781 7 3 3 CD 10_1101-436634 781 8 most most RBS 10_1101-436634 781 9 frequently frequently RB 10_1101-436634 781 10 recurring recur VBG 10_1101-436634 781 11 genes gene NNS 10_1101-436634 781 12 , , , 10_1101-436634 781 13 as as IN 10_1101-436634 781 14 ranked rank VBN 10_1101-436634 781 15 by by IN 10_1101-436634 781 16 fraction fraction NN 10_1101-436634 781 17 of of IN 10_1101-436634 781 18 samples sample NNS 10_1101-436634 781 19 . . . 10_1101-436634 782 1 Shading shade VBG 10_1101-436634 782 2 corresponds correspond NNS 10_1101-436634 782 3 to to IN 10_1101-436634 782 4 the the DT 10_1101-436634 782 5 fraction fraction NN 10_1101-436634 782 6 of of IN 10_1101-436634 782 7 samples sample NNS 10_1101-436634 782 8 and and CC 10_1101-436634 782 9 columns column NNS 10_1101-436634 782 10 represent represent VBP 10_1101-436634 782 11 cancer cancer NN 10_1101-436634 782 12 types type NNS 10_1101-436634 782 13 . . . 10_1101-436634 783 1 .CC .CC NFP 10_1101-436634 783 2 - - : 10_1101-436634 783 3 BY by IN 10_1101-436634 783 4 - - HYPH 10_1101-436634 783 5 NC NC NNP 10_1101-436634 783 6 - - HYPH 10_1101-436634 783 7 ND ND NNP 10_1101-436634 783 8 4.0 4.0 CD 10_1101-436634 783 9 International international JJ 10_1101-436634 783 10 licensea licensea NNS 10_1101-436634 783 11 certified certify VBN 10_1101-436634 783 12 by by IN 10_1101-436634 783 13 peer peer NN 10_1101-436634 783 14 review review NN 10_1101-436634 783 15 ) ) -RRB- 10_1101-436634 783 16 is be VBZ 10_1101-436634 783 17 the the DT 10_1101-436634 783 18 author author NN 10_1101-436634 783 19 / / SYM 10_1101-436634 783 20 funder funder NN 10_1101-436634 783 21 , , , 10_1101-436634 783 22 who who WP 10_1101-436634 783 23 has have VBZ 10_1101-436634 783 24 granted grant VBN 10_1101-436634 783 25 bioRxiv biorxiv IN 10_1101-436634 783 26 a a DT 10_1101-436634 783 27 license license NN 10_1101-436634 783 28 to to TO 10_1101-436634 783 29 display display VB 10_1101-436634 783 30 the the DT 10_1101-436634 783 31 preprint preprint NN 10_1101-436634 783 32 in in IN 10_1101-436634 783 33 perpetuity perpetuity NN 10_1101-436634 783 34 . . . 10_1101-436634 784 1 It -PRON- PRP 10_1101-436634 784 2 is be VBZ 10_1101-436634 784 3 made make VBN 10_1101-436634 784 4 available available JJ 10_1101-436634 784 5 under under IN 10_1101-436634 784 6 The the DT 10_1101-436634 784 7 copyright copyright NN 10_1101-436634 784 8 holder holder NN 10_1101-436634 784 9 for for IN 10_1101-436634 784 10 this this DT 10_1101-436634 784 11 preprint preprint NN 10_1101-436634 784 12 ( ( -LRB- 10_1101-436634 784 13 which which WDT 10_1101-436634 784 14 was be VBD 10_1101-436634 784 15 notthis notthis DT 10_1101-436634 784 16 version version NN 10_1101-436634 784 17 posted post VBN 10_1101-436634 784 18 January January NNP 10_1101-436634 784 19 5 5 CD 10_1101-436634 784 20 , , , 10_1101-436634 784 21 2021 2021 CD 10_1101-436634 784 22 . . . 10_1101-436634 784 23 ; ; : 10_1101-436634 784 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 784 25 : : : 10_1101-436634 784 26 bioRxiv biorxiv VB 10_1101-436634 784 27 preprint preprint NN 10_1101-436634 784 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 784 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 784 30 44 44 CD 10_1101-436634 784 31 Supplementary Supplementary NNP 10_1101-436634 784 32 Figure Figure NNP 10_1101-436634 784 33 11 11 CD 10_1101-436634 784 34 . . . 10_1101-436634 785 1 Analysis Analysis NNP 10_1101-436634 785 2 of of IN 10_1101-436634 785 3 HCC HCC NNP 10_1101-436634 785 4 , , , 10_1101-436634 785 5 OSCC OSCC NNP 10_1101-436634 785 6 , , , 10_1101-436634 785 7 and and CC 10_1101-436634 785 8 SCLC SCLC NNP 10_1101-436634 785 9 cohorts cohort NNS 10_1101-436634 785 10 reveals reveal VBZ 10_1101-436634 785 11 genes gene NNS 10_1101-436634 785 12 recurrently recurrently RB 10_1101-436634 785 13 disrupted disrupt VBN 10_1101-436634 785 14 by by IN 10_1101-436634 785 15 variants variant NNS 10_1101-436634 785 16 which which WDT 10_1101-436634 785 17 promote promote VBP 10_1101-436634 785 18 splicing splicing NN 10_1101-436634 785 19 of of IN 10_1101-436634 785 20 particular particular JJ 10_1101-436634 785 21 canonical canonical JJ 10_1101-436634 785 22 junctions junction NNS 10_1101-436634 785 23 Results Results NNPS 10_1101-436634 785 24 of of IN 10_1101-436634 785 25 analysis analysis NN 10_1101-436634 785 26 for for IN 10_1101-436634 785 27 recurrently recurrently RB 10_1101-436634 785 28 disrupted disrupt VBN 10_1101-436634 785 29 genes gene NNS 10_1101-436634 785 30 in in IN 10_1101-436634 785 31 each each DT 10_1101-436634 785 32 TCGA TCGA NNP 10_1101-436634 785 33 cohort cohort NN 10_1101-436634 785 34 . . . 10_1101-436634 786 1 A a LS 10_1101-436634 786 2 ) ) -RRB- 10_1101-436634 786 3 Rows row NNS 10_1101-436634 786 4 correspond correspond VB 10_1101-436634 786 5 to to IN 10_1101-436634 786 6 the the DT 10_1101-436634 786 7 4 4 CD 10_1101-436634 786 8 most most RBS 10_1101-436634 786 9 frequently frequently RB 10_1101-436634 786 10 recurring recur VBG 10_1101-436634 786 11 genes gene NNS 10_1101-436634 786 12 , , , 10_1101-436634 786 13 as as IN 10_1101-436634 786 14 ranked rank VBN 10_1101-436634 786 15 by by IN 10_1101-436634 786 16 binomial binomial NNP 10_1101-436634 786 17 p p NN 10_1101-436634 786 18 - - HYPH 10_1101-436634 786 19 value value NN 10_1101-436634 786 20 . . . 10_1101-436634 787 1 Shading shade VBG 10_1101-436634 787 2 corresponds correspond NNS 10_1101-436634 787 3 to to IN 10_1101-436634 787 4 −log10(p −log10(p ADD 10_1101-436634 787 5 value value NN 10_1101-436634 787 6 ) ) -RRB- 10_1101-436634 787 7 and and CC 10_1101-436634 787 8 columns column NNS 10_1101-436634 787 9 represent represent VBP 10_1101-436634 787 10 cancer cancer NN 10_1101-436634 787 11 types type NNS 10_1101-436634 787 12 . . . 10_1101-436634 788 1 B b NN 10_1101-436634 788 2 ) ) -RRB- 10_1101-436634 788 3 Rows row NNS 10_1101-436634 788 4 correspond correspond VB 10_1101-436634 788 5 to to IN 10_1101-436634 788 6 the the DT 10_1101-436634 788 7 4 4 CD 10_1101-436634 788 8 most most RBS 10_1101-436634 788 9 frequently frequently RB 10_1101-436634 788 10 recurring recur VBG 10_1101-436634 788 11 genes gene NNS 10_1101-436634 788 12 , , , 10_1101-436634 788 13 as as IN 10_1101-436634 788 14 ranked rank VBN 10_1101-436634 788 15 by by IN 10_1101-436634 788 16 fraction fraction NN 10_1101-436634 788 17 of of IN 10_1101-436634 788 18 samples sample NNS 10_1101-436634 788 19 . . . 10_1101-436634 789 1 Shading shade VBG 10_1101-436634 789 2 corresponds correspond NNS 10_1101-436634 789 3 to to IN 10_1101-436634 789 4 the the DT 10_1101-436634 789 5 fraction fraction NN 10_1101-436634 789 6 of of IN 10_1101-436634 789 7 samples sample NNS 10_1101-436634 789 8 and and CC 10_1101-436634 789 9 columns column NNS 10_1101-436634 789 10 represent represent VBP 10_1101-436634 789 11 cancer cancer NN 10_1101-436634 789 12 types type NNS 10_1101-436634 789 13 . . . 10_1101-436634 790 1 .CC .CC NFP 10_1101-436634 790 2 - - : 10_1101-436634 790 3 BY by IN 10_1101-436634 790 4 - - HYPH 10_1101-436634 790 5 NC NC NNP 10_1101-436634 790 6 - - HYPH 10_1101-436634 790 7 ND ND NNP 10_1101-436634 790 8 4.0 4.0 CD 10_1101-436634 790 9 International international JJ 10_1101-436634 790 10 licensea licensea NNS 10_1101-436634 790 11 certified certify VBN 10_1101-436634 790 12 by by IN 10_1101-436634 790 13 peer peer NN 10_1101-436634 790 14 review review NN 10_1101-436634 790 15 ) ) -RRB- 10_1101-436634 790 16 is be VBZ 10_1101-436634 790 17 the the DT 10_1101-436634 790 18 author author NN 10_1101-436634 790 19 / / SYM 10_1101-436634 790 20 funder funder NN 10_1101-436634 790 21 , , , 10_1101-436634 790 22 who who WP 10_1101-436634 790 23 has have VBZ 10_1101-436634 790 24 granted grant VBN 10_1101-436634 790 25 bioRxiv biorxiv IN 10_1101-436634 790 26 a a DT 10_1101-436634 790 27 license license NN 10_1101-436634 790 28 to to TO 10_1101-436634 790 29 display display VB 10_1101-436634 790 30 the the DT 10_1101-436634 790 31 preprint preprint NN 10_1101-436634 790 32 in in IN 10_1101-436634 790 33 perpetuity perpetuity NN 10_1101-436634 790 34 . . . 10_1101-436634 791 1 It -PRON- PRP 10_1101-436634 791 2 is be VBZ 10_1101-436634 791 3 made make VBN 10_1101-436634 791 4 available available JJ 10_1101-436634 791 5 under under IN 10_1101-436634 791 6 The the DT 10_1101-436634 791 7 copyright copyright NN 10_1101-436634 791 8 holder holder NN 10_1101-436634 791 9 for for IN 10_1101-436634 791 10 this this DT 10_1101-436634 791 11 preprint preprint NN 10_1101-436634 791 12 ( ( -LRB- 10_1101-436634 791 13 which which WDT 10_1101-436634 791 14 was be VBD 10_1101-436634 791 15 notthis notthis DT 10_1101-436634 791 16 version version NN 10_1101-436634 791 17 posted post VBN 10_1101-436634 791 18 January January NNP 10_1101-436634 791 19 5 5 CD 10_1101-436634 791 20 , , , 10_1101-436634 791 21 2021 2021 CD 10_1101-436634 791 22 . . . 10_1101-436634 791 23 ; ; : 10_1101-436634 791 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 791 25 : : : 10_1101-436634 791 26 bioRxiv biorxiv VB 10_1101-436634 791 27 preprint preprint NN 10_1101-436634 791 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 791 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 791 30 45 45 CD 10_1101-436634 791 31 Supplementary Supplementary NNP 10_1101-436634 791 32 Figure Figure NNP 10_1101-436634 791 33 12 12 CD 10_1101-436634 791 34 : : : 10_1101-436634 791 35 Intronic intronic JJ 10_1101-436634 791 36 SNV SNV NNP 10_1101-436634 791 37 in in IN 10_1101-436634 791 38 TP53 TP53 NNP 10_1101-436634 791 39 associated associate VBN 10_1101-436634 791 40 with with IN 10_1101-436634 791 41 alternative alternative JJ 10_1101-436634 791 42 donor donor NN 10_1101-436634 791 43 usage usage NN 10_1101-436634 791 44 . . . 10_1101-436634 792 1 A a LS 10_1101-436634 792 2 ) ) -RRB- 10_1101-436634 792 3 IGV igv NN 10_1101-436634 792 4 snapshot snapshot NN 10_1101-436634 792 5 of of IN 10_1101-436634 792 6 a a DT 10_1101-436634 792 7 single single JJ 10_1101-436634 792 8 nucleotide nucleotide JJ 10_1101-436634 792 9 variant variant NN 10_1101-436634 792 10 ( ( -LRB- 10_1101-436634 792 11 GRCh38 grch38 NN 10_1101-436634 792 12 , , , 10_1101-436634 792 13 chr17 chr17 NN 10_1101-436634 792 14 : : : 10_1101-436634 792 15 g.7673609C g.7673609c XX 10_1101-436634 792 16 > > XX 10_1101-436634 792 17 A a NN 10_1101-436634 792 18 ) ) -RRB- 10_1101-436634 792 19 within within IN 10_1101-436634 792 20 an an DT 10_1101-436634 792 21 intron intron NN 10_1101-436634 792 22 of of IN 10_1101-436634 792 23 TP53 TP53 NNP 10_1101-436634 792 24 in in IN 10_1101-436634 792 25 an an DT 10_1101-436634 792 26 OSCC OSCC NNP 10_1101-436634 792 27 sample sample NN 10_1101-436634 792 28 . . . 10_1101-436634 793 1 This this DT 10_1101-436634 793 2 variant variant NN 10_1101-436634 793 3 is be VBZ 10_1101-436634 793 4 associated associate VBN 10_1101-436634 793 5 with with IN 10_1101-436634 793 6 an an DT 10_1101-436634 793 7 exon exon JJ 10_1101-436634 793 8 skipping skipping NN 10_1101-436634 793 9 event event NN 10_1101-436634 793 10 with with IN 10_1101-436634 793 11 23 23 CD 10_1101-436634 793 12 reads read NNS 10_1101-436634 793 13 of of IN 10_1101-436634 793 14 support support NN 10_1101-436634 793 15 and and CC 10_1101-436634 793 16 an an DT 10_1101-436634 793 17 alternate alternate JJ 10_1101-436634 793 18 acceptor acceptor NN 10_1101-436634 793 19 site site NN 10_1101-436634 793 20 usage usage NN 10_1101-436634 793 21 with with IN 10_1101-436634 793 22 41 41 CD 10_1101-436634 793 23 reads read NNS 10_1101-436634 793 24 of of IN 10_1101-436634 793 25 support support NN 10_1101-436634 793 26 . . . 10_1101-436634 794 1 This this DT 10_1101-436634 794 2 result result NN 10_1101-436634 794 3 was be VBD 10_1101-436634 794 4 found find VBN 10_1101-436634 794 5 using use VBG 10_1101-436634 794 6 the the DT 10_1101-436634 794 7 default default NN 10_1101-436634 794 8 splice splice NN 10_1101-436634 794 9 variant variant JJ 10_1101-436634 794 10 window window NN 10_1101-436634 794 11 parameter parameter NN 10_1101-436634 794 12 ( ( -LRB- 10_1101-436634 794 13 i2e3 i2e3 ADD 10_1101-436634 794 14 ) ) -RRB- 10_1101-436634 794 15 . . . 10_1101-436634 795 1 B b NN 10_1101-436634 795 2 ) ) -RRB- 10_1101-436634 795 3 Sashimi Sashimi NNP 10_1101-436634 795 4 plot plot NN 10_1101-436634 795 5 visualization visualization NN 10_1101-436634 795 6 of of IN 10_1101-436634 795 7 the the DT 10_1101-436634 795 8 novel novel JJ 10_1101-436634 795 9 junction junction NN 10_1101-436634 795 10 . . . 10_1101-436634 796 1 .CC .CC NFP 10_1101-436634 796 2 - - : 10_1101-436634 796 3 BY by IN 10_1101-436634 796 4 - - HYPH 10_1101-436634 796 5 NC NC NNP 10_1101-436634 796 6 - - HYPH 10_1101-436634 796 7 ND ND NNP 10_1101-436634 796 8 4.0 4.0 CD 10_1101-436634 796 9 International international JJ 10_1101-436634 796 10 licensea licensea NNS 10_1101-436634 796 11 certified certify VBN 10_1101-436634 796 12 by by IN 10_1101-436634 796 13 peer peer NN 10_1101-436634 796 14 review review NN 10_1101-436634 796 15 ) ) -RRB- 10_1101-436634 796 16 is be VBZ 10_1101-436634 796 17 the the DT 10_1101-436634 796 18 author author NN 10_1101-436634 796 19 / / SYM 10_1101-436634 796 20 funder funder NN 10_1101-436634 796 21 , , , 10_1101-436634 796 22 who who WP 10_1101-436634 796 23 has have VBZ 10_1101-436634 796 24 granted grant VBN 10_1101-436634 796 25 bioRxiv biorxiv IN 10_1101-436634 796 26 a a DT 10_1101-436634 796 27 license license NN 10_1101-436634 796 28 to to TO 10_1101-436634 796 29 display display VB 10_1101-436634 796 30 the the DT 10_1101-436634 796 31 preprint preprint NN 10_1101-436634 796 32 in in IN 10_1101-436634 796 33 perpetuity perpetuity NN 10_1101-436634 796 34 . . . 10_1101-436634 797 1 It -PRON- PRP 10_1101-436634 797 2 is be VBZ 10_1101-436634 797 3 made make VBN 10_1101-436634 797 4 available available JJ 10_1101-436634 797 5 under under IN 10_1101-436634 797 6 The the DT 10_1101-436634 797 7 copyright copyright NN 10_1101-436634 797 8 holder holder NN 10_1101-436634 797 9 for for IN 10_1101-436634 797 10 this this DT 10_1101-436634 797 11 preprint preprint NN 10_1101-436634 797 12 ( ( -LRB- 10_1101-436634 797 13 which which WDT 10_1101-436634 797 14 was be VBD 10_1101-436634 797 15 notthis notthis DT 10_1101-436634 797 16 version version NN 10_1101-436634 797 17 posted post VBN 10_1101-436634 797 18 January January NNP 10_1101-436634 797 19 5 5 CD 10_1101-436634 797 20 , , , 10_1101-436634 797 21 2021 2021 CD 10_1101-436634 797 22 . . . 10_1101-436634 797 23 ; ; : 10_1101-436634 797 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 797 25 : : : 10_1101-436634 797 26 bioRxiv biorxiv VB 10_1101-436634 797 27 preprint preprint NN 10_1101-436634 797 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 797 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-436634 797 30 46 46 CD 10_1101-436634 797 31 Supplementary Supplementary NNP 10_1101-436634 797 32 Figure Figure NNP 10_1101-436634 797 33 13 13 CD 10_1101-436634 797 34 : : : 10_1101-436634 797 35 Intronic intronic JJ 10_1101-436634 797 36 deletion deletion NN 10_1101-436634 797 37 in in IN 10_1101-436634 797 38 RNF145 RNF145 NNP 10_1101-436634 797 39 associated associate VBN 10_1101-436634 797 40 with with IN 10_1101-436634 797 41 alternative alternative JJ 10_1101-436634 797 42 donor donor NN 10_1101-436634 797 43 usage usage NN 10_1101-436634 797 44 . . . 10_1101-436634 798 1 A a LS 10_1101-436634 798 2 ) ) -RRB- 10_1101-436634 798 3 IGV igv NN 10_1101-436634 798 4 snapshot snapshot NN 10_1101-436634 798 5 of of IN 10_1101-436634 798 6 a a DT 10_1101-436634 798 7 single single JJ 10_1101-436634 798 8 nucleotide nucleotide JJ 10_1101-436634 798 9 variant variant NN 10_1101-436634 798 10 ( ( -LRB- 10_1101-436634 798 11 GRCh38 GRCh38 NNP 10_1101-436634 798 12 , , , 10_1101-436634 798 13 chr5 chr5 NN 10_1101-436634 798 14 : : : 10_1101-436634 798 15 g.159169058delA g.159169058delA NNP 10_1101-436634 798 16 ) ) -RRB- 10_1101-436634 798 17 within within IN 10_1101-436634 798 18 an an DT 10_1101-436634 798 19 intron intron NN 10_1101-436634 798 20 of of IN 10_1101-436634 798 21 RNF145 RNF145 NNP 10_1101-436634 798 22 in in IN 10_1101-436634 798 23 COAD COAD NNP 10_1101-436634 798 24 samples sample NNS 10_1101-436634 798 25 . . . 10_1101-436634 799 1 This this DT 10_1101-436634 799 2 variant variant NN 10_1101-436634 799 3 is be VBZ 10_1101-436634 799 4 associated associate VBN 10_1101-436634 799 5 with with IN 10_1101-436634 799 6 an an DT 10_1101-436634 799 7 exon exon JJ 10_1101-436634 799 8 skipping skipping NN 10_1101-436634 799 9 event event NN 10_1101-436634 799 10 with with IN 10_1101-436634 799 11 8 8 CD 10_1101-436634 799 12 and and CC 10_1101-436634 799 13 6 6 CD 10_1101-436634 799 14 reads read NNS 10_1101-436634 799 15 of of IN 10_1101-436634 799 16 support support NN 10_1101-436634 799 17 for for IN 10_1101-436634 799 18 the the DT 10_1101-436634 799 19 samples sample NNS 10_1101-436634 799 20 shown show VBN 10_1101-436634 799 21 . . . 10_1101-436634 800 1 This this DT 10_1101-436634 800 2 result result NN 10_1101-436634 800 3 was be VBD 10_1101-436634 800 4 found find VBN 10_1101-436634 800 5 using use VBG 10_1101-436634 800 6 the the DT 10_1101-436634 800 7 default default NN 10_1101-436634 800 8 splice splice NN 10_1101-436634 800 9 variant variant JJ 10_1101-436634 800 10 window window NN 10_1101-436634 800 11 parameter parameter NN 10_1101-436634 800 12 ( ( -LRB- 10_1101-436634 800 13 i2e3 i2e3 ADD 10_1101-436634 800 14 ) ) -RRB- 10_1101-436634 800 15 . . . 10_1101-436634 801 1 B b NN 10_1101-436634 801 2 ) ) -RRB- 10_1101-436634 801 3 Sashimi Sashimi NNP 10_1101-436634 801 4 plot plot NN 10_1101-436634 801 5 visualization visualization NN 10_1101-436634 801 6 of of IN 10_1101-436634 801 7 the the DT 10_1101-436634 801 8 novel novel JJ 10_1101-436634 801 9 junction junction NN 10_1101-436634 801 10 . . . 10_1101-436634 802 1 .CC .CC NFP 10_1101-436634 802 2 - - : 10_1101-436634 802 3 BY by IN 10_1101-436634 802 4 - - HYPH 10_1101-436634 802 5 NC NC NNP 10_1101-436634 802 6 - - HYPH 10_1101-436634 802 7 ND ND NNP 10_1101-436634 802 8 4.0 4.0 CD 10_1101-436634 802 9 International international JJ 10_1101-436634 802 10 licensea licensea NNS 10_1101-436634 802 11 certified certify VBN 10_1101-436634 802 12 by by IN 10_1101-436634 802 13 peer peer NN 10_1101-436634 802 14 review review NN 10_1101-436634 802 15 ) ) -RRB- 10_1101-436634 802 16 is be VBZ 10_1101-436634 802 17 the the DT 10_1101-436634 802 18 author author NN 10_1101-436634 802 19 / / SYM 10_1101-436634 802 20 funder funder NN 10_1101-436634 802 21 , , , 10_1101-436634 802 22 who who WP 10_1101-436634 802 23 has have VBZ 10_1101-436634 802 24 granted grant VBN 10_1101-436634 802 25 bioRxiv biorxiv IN 10_1101-436634 802 26 a a DT 10_1101-436634 802 27 license license NN 10_1101-436634 802 28 to to TO 10_1101-436634 802 29 display display VB 10_1101-436634 802 30 the the DT 10_1101-436634 802 31 preprint preprint NN 10_1101-436634 802 32 in in IN 10_1101-436634 802 33 perpetuity perpetuity NN 10_1101-436634 802 34 . . . 10_1101-436634 803 1 It -PRON- PRP 10_1101-436634 803 2 is be VBZ 10_1101-436634 803 3 made make VBN 10_1101-436634 803 4 available available JJ 10_1101-436634 803 5 under under IN 10_1101-436634 803 6 The the DT 10_1101-436634 803 7 copyright copyright NN 10_1101-436634 803 8 holder holder NN 10_1101-436634 803 9 for for IN 10_1101-436634 803 10 this this DT 10_1101-436634 803 11 preprint preprint NN 10_1101-436634 803 12 ( ( -LRB- 10_1101-436634 803 13 which which WDT 10_1101-436634 803 14 was be VBD 10_1101-436634 803 15 notthis notthis DT 10_1101-436634 803 16 version version NN 10_1101-436634 803 17 posted post VBN 10_1101-436634 803 18 January January NNP 10_1101-436634 803 19 5 5 CD 10_1101-436634 803 20 , , , 10_1101-436634 803 21 2021 2021 CD 10_1101-436634 803 22 . . . 10_1101-436634 803 23 ; ; : 10_1101-436634 803 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 803 25 : : : 10_1101-436634 803 26 bioRxiv biorxiv VB 10_1101-436634 803 27 preprint preprint NN 10_1101-436634 803 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 803 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ PDT 10_1101-436634 803 30 47 47 CD 10_1101-436634 803 31 .CC .CC NFP 10_1101-436634 803 32 - - HYPH 10_1101-436634 803 33 BY by IN 10_1101-436634 803 34 - - HYPH 10_1101-436634 803 35 NC NC NNP 10_1101-436634 803 36 - - HYPH 10_1101-436634 803 37 ND ND NNP 10_1101-436634 803 38 4.0 4.0 CD 10_1101-436634 803 39 International international JJ 10_1101-436634 803 40 licensea licensea NNS 10_1101-436634 803 41 certified certify VBN 10_1101-436634 803 42 by by IN 10_1101-436634 803 43 peer peer NN 10_1101-436634 803 44 review review NN 10_1101-436634 803 45 ) ) -RRB- 10_1101-436634 803 46 is be VBZ 10_1101-436634 803 47 the the DT 10_1101-436634 803 48 author author NN 10_1101-436634 803 49 / / SYM 10_1101-436634 803 50 funder funder NN 10_1101-436634 803 51 , , , 10_1101-436634 803 52 who who WP 10_1101-436634 803 53 has have VBZ 10_1101-436634 803 54 granted grant VBN 10_1101-436634 803 55 bioRxiv biorxiv IN 10_1101-436634 803 56 a a DT 10_1101-436634 803 57 license license NN 10_1101-436634 803 58 to to TO 10_1101-436634 803 59 display display VB 10_1101-436634 803 60 the the DT 10_1101-436634 803 61 preprint preprint NN 10_1101-436634 803 62 in in IN 10_1101-436634 803 63 perpetuity perpetuity NN 10_1101-436634 803 64 . . . 10_1101-436634 804 1 It -PRON- PRP 10_1101-436634 804 2 is be VBZ 10_1101-436634 804 3 made make VBN 10_1101-436634 804 4 available available JJ 10_1101-436634 804 5 under under IN 10_1101-436634 804 6 The the DT 10_1101-436634 804 7 copyright copyright NN 10_1101-436634 804 8 holder holder NN 10_1101-436634 804 9 for for IN 10_1101-436634 804 10 this this DT 10_1101-436634 804 11 preprint preprint NN 10_1101-436634 804 12 ( ( -LRB- 10_1101-436634 804 13 which which WDT 10_1101-436634 804 14 was be VBD 10_1101-436634 804 15 notthis notthis DT 10_1101-436634 804 16 version version NN 10_1101-436634 804 17 posted post VBN 10_1101-436634 804 18 January January NNP 10_1101-436634 804 19 5 5 CD 10_1101-436634 804 20 , , , 10_1101-436634 804 21 2021 2021 CD 10_1101-436634 804 22 . . . 10_1101-436634 804 23 ; ; : 10_1101-436634 804 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 804 25 : : : 10_1101-436634 804 26 bioRxiv biorxiv VB 10_1101-436634 804 27 preprint preprint NN 10_1101-436634 804 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 804 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NFP 10_1101-436634 804 30 48 48 CD 10_1101-436634 804 31 Supplementary Supplementary NNP 10_1101-436634 804 32 Figure Figure NNP 10_1101-436634 804 33 14 14 CD 10_1101-436634 804 34 : : : 10_1101-436634 804 35 Several several JJ 10_1101-436634 804 36 SNVs SNVs NNPS 10_1101-436634 804 37 in in IN 10_1101-436634 804 38 CDKN2A CDKN2A NNP 10_1101-436634 804 39 associated associate VBN 10_1101-436634 804 40 with with IN 10_1101-436634 804 41 alternate alternate JJ 10_1101-436634 804 42 donor donor NN 10_1101-436634 804 43 usage usage NN 10_1101-436634 804 44 . . . 10_1101-436634 805 1 A a LS 10_1101-436634 805 2 ) ) -RRB- 10_1101-436634 805 3 IGV igv NN 10_1101-436634 805 4 snapshot snapshot NN 10_1101-436634 805 5 of of IN 10_1101-436634 805 6 three three CD 10_1101-436634 805 7 variant variant JJ 10_1101-436634 805 8 positions position NNS 10_1101-436634 805 9 in in IN 10_1101-436634 805 10 CDKN2A CDKN2A NNP 10_1101-436634 805 11 found find VBD 10_1101-436634 805 12 to to TO 10_1101-436634 805 13 be be VB 10_1101-436634 805 14 associated associate VBN 10_1101-436634 805 15 with with IN 10_1101-436634 805 16 usage usage NN 10_1101-436634 805 17 of of IN 10_1101-436634 805 18 an an DT 10_1101-436634 805 19 alternate alternate JJ 10_1101-436634 805 20 donor donor NN 10_1101-436634 805 21 site site NN 10_1101-436634 805 22 that that WDT 10_1101-436634 805 23 leads lead VBZ 10_1101-436634 805 24 to to IN 10_1101-436634 805 25 formation formation NN 10_1101-436634 805 26 of of IN 10_1101-436634 805 27 an an DT 10_1101-436634 805 28 alternate alternate JJ 10_1101-436634 805 29 known know VBN 10_1101-436634 805 30 transcript transcript NN 10_1101-436634 805 31 . . . 10_1101-436634 806 1 This this DT 10_1101-436634 806 2 result result NN 10_1101-436634 806 3 was be VBD 10_1101-436634 806 4 found find VBN 10_1101-436634 806 5 using use VBG 10_1101-436634 806 6 the the DT 10_1101-436634 806 7 default default NN 10_1101-436634 806 8 splice splice NN 10_1101-436634 806 9 variant variant JJ 10_1101-436634 806 10 window window NN 10_1101-436634 806 11 parameter parameter NN 10_1101-436634 806 12 ( ( -LRB- 10_1101-436634 806 13 i2e3 i2e3 ADD 10_1101-436634 806 14 ) ) -RRB- 10_1101-436634 806 15 for for IN 10_1101-436634 806 16 known know VBN 10_1101-436634 806 17 ( ( -LRB- 10_1101-436634 806 18 DA DA NNP 10_1101-436634 806 19 ) ) -RRB- 10_1101-436634 806 20 junctions junction NNS 10_1101-436634 806 21 . . . 10_1101-436634 807 1 B b NN 10_1101-436634 807 2 ) ) -RRB- 10_1101-436634 807 3 Zoomed Zoomed NNP 10_1101-436634 807 4 in in IN 10_1101-436634 807 5 view view NN 10_1101-436634 807 6 of of IN 10_1101-436634 807 7 the the DT 10_1101-436634 807 8 variants variant NNS 10_1101-436634 807 9 identified identify VBN 10_1101-436634 807 10 by by IN 10_1101-436634 807 11 RegTools regtool NNS 10_1101-436634 807 12 that that WDT 10_1101-436634 807 13 are be VBP 10_1101-436634 807 14 associated associate VBN 10_1101-436634 807 15 with with IN 10_1101-436634 807 16 alternate alternate JJ 10_1101-436634 807 17 donor donor NN 10_1101-436634 807 18 usage usage NN 10_1101-436634 807 19 . . . 10_1101-436634 808 1 Two two CD 10_1101-436634 808 2 of of IN 10_1101-436634 808 3 these these DT 10_1101-436634 808 4 variant variant JJ 10_1101-436634 808 5 positions position NNS 10_1101-436634 808 6 flank flank VBP 10_1101-436634 808 7 the the DT 10_1101-436634 808 8 donor donor NN 10_1101-436634 808 9 site site NN 10_1101-436634 808 10 that that WDT 10_1101-436634 808 11 is be VBZ 10_1101-436634 808 12 no no DT 10_1101-436634 808 13 longer long RBR 10_1101-436634 808 14 being be VBG 10_1101-436634 808 15 used use VBN 10_1101-436634 808 16 . . . 10_1101-436634 809 1 C C NNP 10_1101-436634 809 2 ) ) -RRB- 10_1101-436634 809 3 Sashimi Sashimi NNP 10_1101-436634 809 4 plot plot NN 10_1101-436634 809 5 visualizations visualization NNS 10_1101-436634 809 6 for for IN 10_1101-436634 809 7 samples sample NNS 10_1101-436634 809 8 containing contain VBG 10_1101-436634 809 9 the the DT 10_1101-436634 809 10 identified identify VBN 10_1101-436634 809 11 variants variant NNS 10_1101-436634 809 12 that that WDT 10_1101-436634 809 13 show show VBP 10_1101-436634 809 14 alternate alternate JJ 10_1101-436634 809 15 donor donor NN 10_1101-436634 809 16 usage usage NN 10_1101-436634 809 17 . . . 10_1101-436634 810 1 .CC .CC NFP 10_1101-436634 810 2 - - : 10_1101-436634 810 3 BY by IN 10_1101-436634 810 4 - - HYPH 10_1101-436634 810 5 NC NC NNP 10_1101-436634 810 6 - - HYPH 10_1101-436634 810 7 ND ND NNP 10_1101-436634 810 8 4.0 4.0 CD 10_1101-436634 810 9 International international JJ 10_1101-436634 810 10 licensea licensea NNS 10_1101-436634 810 11 certified certify VBN 10_1101-436634 810 12 by by IN 10_1101-436634 810 13 peer peer NN 10_1101-436634 810 14 review review NN 10_1101-436634 810 15 ) ) -RRB- 10_1101-436634 810 16 is be VBZ 10_1101-436634 810 17 the the DT 10_1101-436634 810 18 author author NN 10_1101-436634 810 19 / / SYM 10_1101-436634 810 20 funder funder NN 10_1101-436634 810 21 , , , 10_1101-436634 810 22 who who WP 10_1101-436634 810 23 has have VBZ 10_1101-436634 810 24 granted grant VBN 10_1101-436634 810 25 bioRxiv biorxiv IN 10_1101-436634 810 26 a a DT 10_1101-436634 810 27 license license NN 10_1101-436634 810 28 to to TO 10_1101-436634 810 29 display display VB 10_1101-436634 810 30 the the DT 10_1101-436634 810 31 preprint preprint NN 10_1101-436634 810 32 in in IN 10_1101-436634 810 33 perpetuity perpetuity NN 10_1101-436634 810 34 . . . 10_1101-436634 811 1 It -PRON- PRP 10_1101-436634 811 2 is be VBZ 10_1101-436634 811 3 made make VBN 10_1101-436634 811 4 available available JJ 10_1101-436634 811 5 under under IN 10_1101-436634 811 6 The the DT 10_1101-436634 811 7 copyright copyright NN 10_1101-436634 811 8 holder holder NN 10_1101-436634 811 9 for for IN 10_1101-436634 811 10 this this DT 10_1101-436634 811 11 preprint preprint NN 10_1101-436634 811 12 ( ( -LRB- 10_1101-436634 811 13 which which WDT 10_1101-436634 811 14 was be VBD 10_1101-436634 811 15 notthis notthis DT 10_1101-436634 811 16 version version NN 10_1101-436634 811 17 posted post VBN 10_1101-436634 811 18 January January NNP 10_1101-436634 811 19 5 5 CD 10_1101-436634 811 20 , , , 10_1101-436634 811 21 2021 2021 CD 10_1101-436634 811 22 . . . 10_1101-436634 811 23 ; ; : 10_1101-436634 811 24 https://doi.org/10.1101/436634doi https://doi.org/10.1101/436634doi NN 10_1101-436634 811 25 : : : 10_1101-436634 811 26 bioRxiv biorxiv VB 10_1101-436634 811 27 preprint preprint NN 10_1101-436634 811 28 https://doi.org/10.1101/436634 https://doi.org/10.1101/436634 NNP 10_1101-436634 811 29 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ RB