id sid tid token lemma pos 10_1101-2021_01_02_425006 1 1 Analysis analysis NN 10_1101-2021_01_02_425006 1 2 of of IN 10_1101-2021_01_02_425006 1 3 next- next- JJ 10_1101-2021_01_02_425006 1 4 and and CC 10_1101-2021_01_02_425006 1 5 third third JJ 10_1101-2021_01_02_425006 1 6 - - HYPH 10_1101-2021_01_02_425006 1 7 generation generation NN 10_1101-2021_01_02_425006 1 8 RNA RNA NNP 10_1101-2021_01_02_425006 1 9 - - HYPH 10_1101-2021_01_02_425006 1 10 Seq Seq NNP 10_1101-2021_01_02_425006 1 11 data datum NNS 10_1101-2021_01_02_425006 1 12 reveals reveal VBZ 10_1101-2021_01_02_425006 1 13 the the DT 10_1101-2021_01_02_425006 1 14 structures structure NNS 10_1101-2021_01_02_425006 1 15 of of IN 10_1101-2021_01_02_425006 1 16 alternative alternative JJ 10_1101-2021_01_02_425006 1 17 transcription transcription NN 10_1101-2021_01_02_425006 1 18 units unit NNS 10_1101-2021_01_02_425006 1 19 in in IN 10_1101-2021_01_02_425006 1 20 bacterial bacterial JJ 10_1101-2021_01_02_425006 1 21 genomes genome NNS 10_1101-2021_01_02_425006 1 22 1 1 CD 10_1101-2021_01_02_425006 1 23 Analysis Analysis NNP 10_1101-2021_01_02_425006 1 24 of of IN 10_1101-2021_01_02_425006 1 25 next- next- JJ 10_1101-2021_01_02_425006 1 26 and and CC 10_1101-2021_01_02_425006 1 27 third third JJ 10_1101-2021_01_02_425006 1 28 - - HYPH 10_1101-2021_01_02_425006 1 29 generation generation NN 10_1101-2021_01_02_425006 1 30 RNA RNA NNP 10_1101-2021_01_02_425006 1 31 - - HYPH 10_1101-2021_01_02_425006 1 32 Seq Seq NNP 10_1101-2021_01_02_425006 1 33 data datum NNS 10_1101-2021_01_02_425006 1 34 reveals reveal VBZ 10_1101-2021_01_02_425006 1 35 the the DT 10_1101-2021_01_02_425006 1 36 structures structure NNS 10_1101-2021_01_02_425006 1 37 of of IN 10_1101-2021_01_02_425006 1 38 1 1 CD 10_1101-2021_01_02_425006 1 39 alternative alternative JJ 10_1101-2021_01_02_425006 1 40 transcription transcription NN 10_1101-2021_01_02_425006 1 41 units unit NNS 10_1101-2021_01_02_425006 1 42 in in IN 10_1101-2021_01_02_425006 1 43 bacterial bacterial JJ 10_1101-2021_01_02_425006 1 44 genomes genome NNS 10_1101-2021_01_02_425006 1 45 2 2 CD 10_1101-2021_01_02_425006 1 46 Qi Qi NNP 10_1101-2021_01_02_425006 1 47 Wang1 wang1 NN 10_1101-2021_01_02_425006 1 48 , , , 10_1101-2021_01_02_425006 1 49 Zhaoqian Zhaoqian NNP 10_1101-2021_01_02_425006 1 50 Liu1,2 Liu1,2 NNP 10_1101-2021_01_02_425006 1 51 , , , 10_1101-2021_01_02_425006 1 52 Bo Bo NNP 10_1101-2021_01_02_425006 1 53 Yan3 Yan3 NNP 10_1101-2021_01_02_425006 1 54 , , , 10_1101-2021_01_02_425006 1 55 Wen Wen NNP 10_1101-2021_01_02_425006 1 56 - - HYPH 10_1101-2021_01_02_425006 1 57 Chi Chi NNP 10_1101-2021_01_02_425006 1 58 Chou4 chou4 NN 10_1101-2021_01_02_425006 1 59 , , , 10_1101-2021_01_02_425006 1 60 Laurence Laurence NNP 10_1101-2021_01_02_425006 1 61 Ettwiller3 Ettwiller3 NNP 10_1101-2021_01_02_425006 1 62 , , , 10_1101-2021_01_02_425006 1 63 Qin Qin NNP 10_1101-2021_01_02_425006 1 64 Ma2,† Ma2,† NNP 10_1101-2021_01_02_425006 1 65 , , , 10_1101-2021_01_02_425006 1 66 and and CC 10_1101-2021_01_02_425006 1 67 Bingqiang Bingqiang NNP 10_1101-2021_01_02_425006 1 68 3 3 CD 10_1101-2021_01_02_425006 1 69 Liu1,† Liu1,† VBD 10_1101-2021_01_02_425006 1 70 4 4 CD 10_1101-2021_01_02_425006 1 71 1 1 CD 10_1101-2021_01_02_425006 1 72 School School NNP 10_1101-2021_01_02_425006 1 73 of of IN 10_1101-2021_01_02_425006 1 74 Mathematics Mathematics NNP 10_1101-2021_01_02_425006 1 75 , , , 10_1101-2021_01_02_425006 1 76 Shandong Shandong NNP 10_1101-2021_01_02_425006 1 77 University University NNP 10_1101-2021_01_02_425006 1 78 , , , 10_1101-2021_01_02_425006 1 79 Jinan Jinan NNP 10_1101-2021_01_02_425006 1 80 250200 250200 CD 10_1101-2021_01_02_425006 1 81 , , , 10_1101-2021_01_02_425006 1 82 China China NNP 10_1101-2021_01_02_425006 1 83 . . . 10_1101-2021_01_02_425006 2 1 5 5 CD 10_1101-2021_01_02_425006 2 2 2 2 CD 10_1101-2021_01_02_425006 2 3 Department Department NNP 10_1101-2021_01_02_425006 2 4 of of IN 10_1101-2021_01_02_425006 2 5 Biomedical Biomedical NNP 10_1101-2021_01_02_425006 2 6 Informatics Informatics NNP 10_1101-2021_01_02_425006 2 7 , , , 10_1101-2021_01_02_425006 2 8 College College NNP 10_1101-2021_01_02_425006 2 9 of of IN 10_1101-2021_01_02_425006 2 10 Medicine Medicine NNP 10_1101-2021_01_02_425006 2 11 , , , 10_1101-2021_01_02_425006 2 12 The the DT 10_1101-2021_01_02_425006 2 13 Ohio Ohio NNP 10_1101-2021_01_02_425006 2 14 State State NNP 10_1101-2021_01_02_425006 2 15 University University NNP 10_1101-2021_01_02_425006 2 16 , , , 10_1101-2021_01_02_425006 2 17 Columbus Columbus NNP 10_1101-2021_01_02_425006 2 18 , , , 10_1101-2021_01_02_425006 2 19 6 6 CD 10_1101-2021_01_02_425006 2 20 OH oh NN 10_1101-2021_01_02_425006 2 21 43210 43210 CD 10_1101-2021_01_02_425006 2 22 , , , 10_1101-2021_01_02_425006 2 23 USA USA NNP 10_1101-2021_01_02_425006 2 24 . . . 10_1101-2021_01_02_425006 3 1 7 7 CD 10_1101-2021_01_02_425006 3 2 3 3 CD 10_1101-2021_01_02_425006 3 3 New New NNP 10_1101-2021_01_02_425006 3 4 England England NNP 10_1101-2021_01_02_425006 3 5 Biolabs Biolabs NNP 10_1101-2021_01_02_425006 3 6 Inc. Inc. NNP 10_1101-2021_01_02_425006 3 7 , , , 10_1101-2021_01_02_425006 3 8 Ipswich Ipswich NNP 10_1101-2021_01_02_425006 3 9 , , , 10_1101-2021_01_02_425006 3 10 MA MA NNP 10_1101-2021_01_02_425006 3 11 01938 01938 CD 10_1101-2021_01_02_425006 3 12 , , , 10_1101-2021_01_02_425006 3 13 USA USA NNP 10_1101-2021_01_02_425006 3 14 . . . 10_1101-2021_01_02_425006 4 1 8 8 CD 10_1101-2021_01_02_425006 4 2 4 4 CD 10_1101-2021_01_02_425006 4 3 Infectious Infectious NNP 10_1101-2021_01_02_425006 4 4 Disease Disease NNP 10_1101-2021_01_02_425006 4 5 and and CC 10_1101-2021_01_02_425006 4 6 Microbiome Microbiome NNP 10_1101-2021_01_02_425006 4 7 Program Program NNP 10_1101-2021_01_02_425006 4 8 , , , 10_1101-2021_01_02_425006 4 9 Broad Broad NNP 10_1101-2021_01_02_425006 4 10 Institute Institute NNP 10_1101-2021_01_02_425006 4 11 of of IN 10_1101-2021_01_02_425006 4 12 MIT MIT NNP 10_1101-2021_01_02_425006 4 13 and and CC 10_1101-2021_01_02_425006 4 14 Harvard Harvard NNP 10_1101-2021_01_02_425006 4 15 , , , 10_1101-2021_01_02_425006 4 16 Cambridge Cambridge NNP 10_1101-2021_01_02_425006 4 17 , , , 10_1101-2021_01_02_425006 4 18 MA MA NNP 10_1101-2021_01_02_425006 4 19 9 9 CD 10_1101-2021_01_02_425006 4 20 02142 02142 CD 10_1101-2021_01_02_425006 4 21 , , , 10_1101-2021_01_02_425006 4 22 USA USA NNP 10_1101-2021_01_02_425006 4 23 . . . 10_1101-2021_01_02_425006 5 1 10 10 CD 10_1101-2021_01_02_425006 5 2 †Corresponding †corresponding NN 10_1101-2021_01_02_425006 5 3 author author NN 10_1101-2021_01_02_425006 5 4 . . . 10_1101-2021_01_02_425006 6 1 Email email NN 10_1101-2021_01_02_425006 6 2 : : : 10_1101-2021_01_02_425006 6 3 bingqiang@sdu.edu.cn bingqiang@sdu.edu.cn NNP 10_1101-2021_01_02_425006 6 4 ( ( -LRB- 10_1101-2021_01_02_425006 6 5 B.L. B.L. NNP 10_1101-2021_01_02_425006 7 1 ) ) -RRB- 10_1101-2021_01_02_425006 7 2 ; ; : 10_1101-2021_01_02_425006 7 3 qin.ma@osumc.edu qin.ma@osumc.edu NNP 10_1101-2021_01_02_425006 7 4 ( ( -LRB- 10_1101-2021_01_02_425006 7 5 Q.M. Q.M. NNP 10_1101-2021_01_02_425006 7 6 ) ) -RRB- 10_1101-2021_01_02_425006 8 1 11 11 CD 10_1101-2021_01_02_425006 8 2 ABSTRACT ABSTRACT NNP 10_1101-2021_01_02_425006 8 3 12 12 CD 10_1101-2021_01_02_425006 8 4 Alternative alternative JJ 10_1101-2021_01_02_425006 8 5 transcription transcription NN 10_1101-2021_01_02_425006 8 6 units unit NNS 10_1101-2021_01_02_425006 8 7 ( ( -LRB- 10_1101-2021_01_02_425006 8 8 ATUs ATUs NNPS 10_1101-2021_01_02_425006 8 9 ) ) -RRB- 10_1101-2021_01_02_425006 8 10 are be VBP 10_1101-2021_01_02_425006 8 11 dynamically dynamically RB 10_1101-2021_01_02_425006 8 12 encoded encode VBN 10_1101-2021_01_02_425006 8 13 under under IN 10_1101-2021_01_02_425006 8 14 different different JJ 10_1101-2021_01_02_425006 8 15 conditions condition NNS 10_1101-2021_01_02_425006 8 16 or or CC 10_1101-2021_01_02_425006 8 17 13 13 CD 10_1101-2021_01_02_425006 8 18 environmental environmental JJ 10_1101-2021_01_02_425006 8 19 stimuli stimulus NNS 10_1101-2021_01_02_425006 8 20 in in IN 10_1101-2021_01_02_425006 8 21 bacterial bacterial JJ 10_1101-2021_01_02_425006 8 22 genomes genome NNS 10_1101-2021_01_02_425006 8 23 , , , 10_1101-2021_01_02_425006 8 24 and and CC 10_1101-2021_01_02_425006 8 25 genome genome NN 10_1101-2021_01_02_425006 8 26 - - HYPH 10_1101-2021_01_02_425006 8 27 scale scale NN 10_1101-2021_01_02_425006 8 28 identification identification NN 10_1101-2021_01_02_425006 8 29 of of IN 10_1101-2021_01_02_425006 8 30 ATUs atu NNS 10_1101-2021_01_02_425006 8 31 is be VBZ 10_1101-2021_01_02_425006 8 32 essential essential JJ 10_1101-2021_01_02_425006 8 33 for for IN 10_1101-2021_01_02_425006 8 34 14 14 CD 10_1101-2021_01_02_425006 8 35 studying study VBG 10_1101-2021_01_02_425006 8 36 the the DT 10_1101-2021_01_02_425006 8 37 emergence emergence NN 10_1101-2021_01_02_425006 8 38 of of IN 10_1101-2021_01_02_425006 8 39 human human JJ 10_1101-2021_01_02_425006 8 40 diseases disease NNS 10_1101-2021_01_02_425006 8 41 caused cause VBN 10_1101-2021_01_02_425006 8 42 by by IN 10_1101-2021_01_02_425006 8 43 bacterial bacterial JJ 10_1101-2021_01_02_425006 8 44 organisms organism NNS 10_1101-2021_01_02_425006 8 45 . . . 10_1101-2021_01_02_425006 9 1 However however RB 10_1101-2021_01_02_425006 9 2 , , , 10_1101-2021_01_02_425006 9 3 it -PRON- PRP 10_1101-2021_01_02_425006 9 4 is be VBZ 10_1101-2021_01_02_425006 9 5 unrealistic unrealistic JJ 10_1101-2021_01_02_425006 9 6 to to TO 10_1101-2021_01_02_425006 9 7 15 15 CD 10_1101-2021_01_02_425006 9 8 identify identify VB 10_1101-2021_01_02_425006 9 9 all all DT 10_1101-2021_01_02_425006 9 10 ATUs atu NNS 10_1101-2021_01_02_425006 9 11 using use VBG 10_1101-2021_01_02_425006 9 12 experimental experimental JJ 10_1101-2021_01_02_425006 9 13 techniques technique NNS 10_1101-2021_01_02_425006 9 14 , , , 10_1101-2021_01_02_425006 9 15 due due IN 10_1101-2021_01_02_425006 9 16 to to IN 10_1101-2021_01_02_425006 9 17 the the DT 10_1101-2021_01_02_425006 9 18 complexity complexity NN 10_1101-2021_01_02_425006 9 19 and and CC 10_1101-2021_01_02_425006 9 20 dynamic dynamic JJ 10_1101-2021_01_02_425006 9 21 nature nature NN 10_1101-2021_01_02_425006 9 22 of of IN 10_1101-2021_01_02_425006 9 23 ATUs atu NNS 10_1101-2021_01_02_425006 9 24 . . . 10_1101-2021_01_02_425006 10 1 16 16 CD 10_1101-2021_01_02_425006 10 2 Here here RB 10_1101-2021_01_02_425006 10 3 we -PRON- PRP 10_1101-2021_01_02_425006 10 4 present present VBP 10_1101-2021_01_02_425006 10 5 the the DT 10_1101-2021_01_02_425006 10 6 first first JJ 10_1101-2021_01_02_425006 10 7 - - HYPH 10_1101-2021_01_02_425006 10 8 of of IN 10_1101-2021_01_02_425006 10 9 - - HYPH 10_1101-2021_01_02_425006 10 10 its -PRON- PRP$ 10_1101-2021_01_02_425006 10 11 - - HYPH 10_1101-2021_01_02_425006 10 12 kind kind NN 10_1101-2021_01_02_425006 10 13 computational computational JJ 10_1101-2021_01_02_425006 10 14 framework framework NN 10_1101-2021_01_02_425006 10 15 , , , 10_1101-2021_01_02_425006 10 16 named name VBN 10_1101-2021_01_02_425006 10 17 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 10 18 , , , 10_1101-2021_01_02_425006 10 19 for for IN 10_1101-2021_01_02_425006 10 20 genome genome NN 10_1101-2021_01_02_425006 10 21 - - HYPH 10_1101-2021_01_02_425006 10 22 scale scale NN 10_1101-2021_01_02_425006 10 23 ATU ATU NNP 10_1101-2021_01_02_425006 10 24 17 17 CD 10_1101-2021_01_02_425006 10 25 prediction prediction NN 10_1101-2021_01_02_425006 10 26 based base VBN 10_1101-2021_01_02_425006 10 27 on on IN 10_1101-2021_01_02_425006 10 28 next next JJ 10_1101-2021_01_02_425006 10 29 - - HYPH 10_1101-2021_01_02_425006 10 30 generation generation NN 10_1101-2021_01_02_425006 10 31 RNA RNA NNP 10_1101-2021_01_02_425006 10 32 - - HYPH 10_1101-2021_01_02_425006 10 33 Seq Seq NNP 10_1101-2021_01_02_425006 10 34 data datum NNS 10_1101-2021_01_02_425006 10 35 . . . 10_1101-2021_01_02_425006 11 1 The the DT 10_1101-2021_01_02_425006 11 2 framework framework NN 10_1101-2021_01_02_425006 11 3 utilizes utilize VBZ 10_1101-2021_01_02_425006 11 4 a a DT 10_1101-2021_01_02_425006 11 5 convex convex NNP 10_1101-2021_01_02_425006 11 6 quadratic quadratic NNP 10_1101-2021_01_02_425006 11 7 18 18 CD 10_1101-2021_01_02_425006 11 8 .CC .CC , 10_1101-2021_01_02_425006 11 9 - - HYPH 10_1101-2021_01_02_425006 11 10 BY by IN 10_1101-2021_01_02_425006 11 11 - - HYPH 10_1101-2021_01_02_425006 11 12 NC NC NNP 10_1101-2021_01_02_425006 11 13 - - HYPH 10_1101-2021_01_02_425006 11 14 ND ND NNP 10_1101-2021_01_02_425006 11 15 4.0 4.0 CD 10_1101-2021_01_02_425006 11 16 International International NNP 10_1101-2021_01_02_425006 11 17 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 11 18 under under IN 10_1101-2021_01_02_425006 11 19 a a DT 10_1101-2021_01_02_425006 11 20 ( ( -LRB- 10_1101-2021_01_02_425006 11 21 which which WDT 10_1101-2021_01_02_425006 11 22 was be VBD 10_1101-2021_01_02_425006 11 23 not not RB 10_1101-2021_01_02_425006 11 24 certified certify VBN 10_1101-2021_01_02_425006 11 25 by by IN 10_1101-2021_01_02_425006 11 26 peer peer NN 10_1101-2021_01_02_425006 11 27 review review NN 10_1101-2021_01_02_425006 11 28 ) ) -RRB- 10_1101-2021_01_02_425006 11 29 is be VBZ 10_1101-2021_01_02_425006 11 30 the the DT 10_1101-2021_01_02_425006 11 31 author author NN 10_1101-2021_01_02_425006 11 32 / / SYM 10_1101-2021_01_02_425006 11 33 funder funder NN 10_1101-2021_01_02_425006 11 34 , , , 10_1101-2021_01_02_425006 11 35 who who WP 10_1101-2021_01_02_425006 11 36 has have VBZ 10_1101-2021_01_02_425006 11 37 granted grant VBN 10_1101-2021_01_02_425006 11 38 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 11 39 a a DT 10_1101-2021_01_02_425006 11 40 license license NN 10_1101-2021_01_02_425006 11 41 to to TO 10_1101-2021_01_02_425006 11 42 display display VB 10_1101-2021_01_02_425006 11 43 the the DT 10_1101-2021_01_02_425006 11 44 preprint preprint NN 10_1101-2021_01_02_425006 11 45 in in IN 10_1101-2021_01_02_425006 11 46 perpetuity perpetuity NN 10_1101-2021_01_02_425006 11 47 . . . 10_1101-2021_01_02_425006 12 1 It -PRON- PRP 10_1101-2021_01_02_425006 12 2 is be VBZ 10_1101-2021_01_02_425006 12 3 made make VBN 10_1101-2021_01_02_425006 12 4 The the DT 10_1101-2021_01_02_425006 12 5 copyright copyright NN 10_1101-2021_01_02_425006 12 6 holder holder NN 10_1101-2021_01_02_425006 12 7 for for IN 10_1101-2021_01_02_425006 12 8 this this DT 10_1101-2021_01_02_425006 12 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 12 10 version version NN 10_1101-2021_01_02_425006 12 11 posted post VBD 10_1101-2021_01_02_425006 12 12 January January NNP 10_1101-2021_01_02_425006 12 13 6 6 CD 10_1101-2021_01_02_425006 12 14 , , , 10_1101-2021_01_02_425006 12 15 2021 2021 CD 10_1101-2021_01_02_425006 12 16 . . . 10_1101-2021_01_02_425006 12 17 ; ; : 10_1101-2021_01_02_425006 12 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 12 19 : : : 10_1101-2021_01_02_425006 12 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 12 21 preprint preprint NN 10_1101-2021_01_02_425006 12 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 12 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 12 24 2 2 CD 10_1101-2021_01_02_425006 12 25 programming programming NN 10_1101-2021_01_02_425006 12 26 model model NN 10_1101-2021_01_02_425006 12 27 to to TO 10_1101-2021_01_02_425006 12 28 seek seek VB 10_1101-2021_01_02_425006 12 29 an an DT 10_1101-2021_01_02_425006 12 30 optimum optimum JJ 10_1101-2021_01_02_425006 12 31 expression expression NN 10_1101-2021_01_02_425006 12 32 combination combination NN 10_1101-2021_01_02_425006 12 33 of of IN 10_1101-2021_01_02_425006 12 34 all all DT 10_1101-2021_01_02_425006 12 35 of of IN 10_1101-2021_01_02_425006 12 36 the the DT 10_1101-2021_01_02_425006 12 37 to to TO 10_1101-2021_01_02_425006 12 38 - - HYPH 10_1101-2021_01_02_425006 12 39 be be VB 10_1101-2021_01_02_425006 12 40 - - HYPH 10_1101-2021_01_02_425006 12 41 identified identify VBN 10_1101-2021_01_02_425006 12 42 ATUs atu NNS 10_1101-2021_01_02_425006 12 43 . . . 10_1101-2021_01_02_425006 13 1 19 19 CD 10_1101-2021_01_02_425006 13 2 The the DT 10_1101-2021_01_02_425006 13 3 predicted predict VBN 10_1101-2021_01_02_425006 13 4 ATUs atu NNS 10_1101-2021_01_02_425006 13 5 in in IN 10_1101-2021_01_02_425006 13 6 E. E. NNP 10_1101-2021_01_02_425006 13 7 coli coli NNS 10_1101-2021_01_02_425006 13 8 reached reach VBD 10_1101-2021_01_02_425006 13 9 a a DT 10_1101-2021_01_02_425006 13 10 precision precision NN 10_1101-2021_01_02_425006 13 11 of of IN 10_1101-2021_01_02_425006 13 12 0.77/0.74 0.77/0.74 NN 10_1101-2021_01_02_425006 13 13 and and CC 10_1101-2021_01_02_425006 13 14 a a DT 10_1101-2021_01_02_425006 13 15 recall recall NN 10_1101-2021_01_02_425006 13 16 of of IN 10_1101-2021_01_02_425006 13 17 0.75/0.76 0.75/0.76 NN 10_1101-2021_01_02_425006 13 18 in in IN 10_1101-2021_01_02_425006 13 19 the the DT 10_1101-2021_01_02_425006 13 20 two two CD 10_1101-2021_01_02_425006 13 21 RNA-20 rna-20 CD 10_1101-2021_01_02_425006 13 22 Sequencing sequencing NN 10_1101-2021_01_02_425006 13 23 datasets dataset NNS 10_1101-2021_01_02_425006 13 24 compared compare VBN 10_1101-2021_01_02_425006 13 25 with with IN 10_1101-2021_01_02_425006 13 26 the the DT 10_1101-2021_01_02_425006 13 27 benchmarked benchmarke VBN 10_1101-2021_01_02_425006 13 28 ATUs atu NNS 10_1101-2021_01_02_425006 13 29 from from IN 10_1101-2021_01_02_425006 13 30 third third JJ 10_1101-2021_01_02_425006 13 31 - - HYPH 10_1101-2021_01_02_425006 13 32 generation generation NN 10_1101-2021_01_02_425006 13 33 RNA RNA NNP 10_1101-2021_01_02_425006 13 34 - - HYPH 10_1101-2021_01_02_425006 13 35 Seq Seq NNP 10_1101-2021_01_02_425006 13 36 data datum NNS 10_1101-2021_01_02_425006 13 37 . . . 10_1101-2021_01_02_425006 14 1 We -PRON- PRP 10_1101-2021_01_02_425006 14 2 21 21 CD 10_1101-2021_01_02_425006 14 3 believe believe VBP 10_1101-2021_01_02_425006 14 4 that that IN 10_1101-2021_01_02_425006 14 5 the the DT 10_1101-2021_01_02_425006 14 6 ATUs atu NNS 10_1101-2021_01_02_425006 14 7 identified identify VBN 10_1101-2021_01_02_425006 14 8 by by IN 10_1101-2021_01_02_425006 14 9 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 14 10 can can MD 10_1101-2021_01_02_425006 14 11 provide provide VB 10_1101-2021_01_02_425006 14 12 fundamental fundamental JJ 10_1101-2021_01_02_425006 14 13 knowledge knowledge NN 10_1101-2021_01_02_425006 14 14 to to TO 10_1101-2021_01_02_425006 14 15 guide guide VB 10_1101-2021_01_02_425006 14 16 the the DT 10_1101-2021_01_02_425006 14 17 22 22 CD 10_1101-2021_01_02_425006 14 18 reconstruction reconstruction NN 10_1101-2021_01_02_425006 14 19 of of IN 10_1101-2021_01_02_425006 14 20 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 14 21 regulatory regulatory JJ 10_1101-2021_01_02_425006 14 22 networks network NNS 10_1101-2021_01_02_425006 14 23 in in IN 10_1101-2021_01_02_425006 14 24 bacterial bacterial JJ 10_1101-2021_01_02_425006 14 25 genomes genome NNS 10_1101-2021_01_02_425006 14 26 . . . 10_1101-2021_01_02_425006 15 1 23 23 CD 10_1101-2021_01_02_425006 15 2 INTRODUCTION introduction NN 10_1101-2021_01_02_425006 15 3 24 24 CD 10_1101-2021_01_02_425006 15 4 An an DT 10_1101-2021_01_02_425006 15 5 operon operon NN 10_1101-2021_01_02_425006 15 6 in in IN 10_1101-2021_01_02_425006 15 7 bacterial bacterial JJ 10_1101-2021_01_02_425006 15 8 genomes genome NNS 10_1101-2021_01_02_425006 15 9 is be VBZ 10_1101-2021_01_02_425006 15 10 defined define VBN 10_1101-2021_01_02_425006 15 11 as as IN 10_1101-2021_01_02_425006 15 12 a a DT 10_1101-2021_01_02_425006 15 13 group group NN 10_1101-2021_01_02_425006 15 14 of of IN 10_1101-2021_01_02_425006 15 15 consecutive consecutive JJ 10_1101-2021_01_02_425006 15 16 genes gene NNS 10_1101-2021_01_02_425006 15 17 regulated regulate VBN 10_1101-2021_01_02_425006 15 18 by by IN 10_1101-2021_01_02_425006 15 19 a a DT 10_1101-2021_01_02_425006 15 20 common common JJ 10_1101-2021_01_02_425006 15 21 25 25 CD 10_1101-2021_01_02_425006 15 22 promoter promoter NN 10_1101-2021_01_02_425006 15 23 that that WDT 10_1101-2021_01_02_425006 15 24 all all DT 10_1101-2021_01_02_425006 15 25 share share VBP 10_1101-2021_01_02_425006 15 26 the the DT 10_1101-2021_01_02_425006 15 27 same same JJ 10_1101-2021_01_02_425006 15 28 terminator terminator NN 10_1101-2021_01_02_425006 15 29 ( ( -LRB- 10_1101-2021_01_02_425006 15 30 1 1 CD 10_1101-2021_01_02_425006 15 31 ) ) -RRB- 10_1101-2021_01_02_425006 15 32 . . . 10_1101-2021_01_02_425006 16 1 Genes gene NNS 10_1101-2021_01_02_425006 16 2 in in IN 10_1101-2021_01_02_425006 16 3 the the DT 10_1101-2021_01_02_425006 16 4 same same JJ 10_1101-2021_01_02_425006 16 5 operon operon NN 10_1101-2021_01_02_425006 16 6 generally generally RB 10_1101-2021_01_02_425006 16 7 encode encode VBP 10_1101-2021_01_02_425006 16 8 proteins protein NNS 10_1101-2021_01_02_425006 16 9 26 26 CD 10_1101-2021_01_02_425006 16 10 with with IN 10_1101-2021_01_02_425006 16 11 relevant relevant JJ 10_1101-2021_01_02_425006 16 12 or or CC 10_1101-2021_01_02_425006 16 13 similar similar JJ 10_1101-2021_01_02_425006 16 14 biological biological JJ 10_1101-2021_01_02_425006 16 15 functions function NNS 10_1101-2021_01_02_425006 16 16 ; ; : 10_1101-2021_01_02_425006 16 17 e.g. e.g. RB 10_1101-2021_01_02_425006 16 18 , , , 10_1101-2021_01_02_425006 16 19 lacZ lacZ NNP 10_1101-2021_01_02_425006 16 20 , , , 10_1101-2021_01_02_425006 16 21 lacY lacy CD 10_1101-2021_01_02_425006 16 22 , , , 10_1101-2021_01_02_425006 16 23 and and CC 10_1101-2021_01_02_425006 16 24 lacA lacA VBD 10_1101-2021_01_02_425006 16 25 in in IN 10_1101-2021_01_02_425006 16 26 the the DT 10_1101-2021_01_02_425006 16 27 lac lac NNP 10_1101-2021_01_02_425006 16 28 operon operon NNP 10_1101-2021_01_02_425006 16 29 encode encode NNP 10_1101-2021_01_02_425006 16 30 proteins protein VBZ 10_1101-2021_01_02_425006 16 31 27 27 CD 10_1101-2021_01_02_425006 16 32 that that WDT 10_1101-2021_01_02_425006 16 33 help help VBP 10_1101-2021_01_02_425006 16 34 cells cell NNS 10_1101-2021_01_02_425006 16 35 use use VB 10_1101-2021_01_02_425006 16 36 lactose lactose NN 10_1101-2021_01_02_425006 16 37 ( ( -LRB- 10_1101-2021_01_02_425006 16 38 1 1 CD 10_1101-2021_01_02_425006 16 39 , , , 10_1101-2021_01_02_425006 16 40 2 2 CD 10_1101-2021_01_02_425006 16 41 ) ) -RRB- 10_1101-2021_01_02_425006 16 42 . . . 10_1101-2021_01_02_425006 17 1 With with IN 10_1101-2021_01_02_425006 17 2 decades decade NNS 10_1101-2021_01_02_425006 17 3 of of IN 10_1101-2021_01_02_425006 17 4 research research NN 10_1101-2021_01_02_425006 17 5 on on IN 10_1101-2021_01_02_425006 17 6 bacterial bacterial JJ 10_1101-2021_01_02_425006 17 7 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 17 8 regulation regulation NN 10_1101-2021_01_02_425006 17 9 , , , 10_1101-2021_01_02_425006 17 10 the the DT 10_1101-2021_01_02_425006 17 11 28 28 CD 10_1101-2021_01_02_425006 17 12 operon operon NNP 10_1101-2021_01_02_425006 17 13 model model NN 10_1101-2021_01_02_425006 17 14 has have VBZ 10_1101-2021_01_02_425006 17 15 been be VBN 10_1101-2021_01_02_425006 17 16 found find VBN 10_1101-2021_01_02_425006 17 17 to to TO 10_1101-2021_01_02_425006 17 18 have have VB 10_1101-2021_01_02_425006 17 19 complex complex JJ 10_1101-2021_01_02_425006 17 20 mechanisms mechanism NNS 10_1101-2021_01_02_425006 17 21 that that WDT 10_1101-2021_01_02_425006 17 22 control control VBP 10_1101-2021_01_02_425006 17 23 expression expression NN 10_1101-2021_01_02_425006 17 24 ( ( -LRB- 10_1101-2021_01_02_425006 17 25 3 3 CD 10_1101-2021_01_02_425006 17 26 - - SYM 10_1101-2021_01_02_425006 17 27 5 5 CD 10_1101-2021_01_02_425006 17 28 ) ) -RRB- 10_1101-2021_01_02_425006 17 29 . . . 10_1101-2021_01_02_425006 18 1 Multiple multiple JJ 10_1101-2021_01_02_425006 18 2 29 29 CD 10_1101-2021_01_02_425006 18 3 studies study NNS 10_1101-2021_01_02_425006 18 4 have have VBP 10_1101-2021_01_02_425006 18 5 shown show VBN 10_1101-2021_01_02_425006 18 6 that that IN 10_1101-2021_01_02_425006 18 7 bacterial bacterial JJ 10_1101-2021_01_02_425006 18 8 genes gene NNS 10_1101-2021_01_02_425006 18 9 are be VBP 10_1101-2021_01_02_425006 18 10 dynamically dynamically RB 10_1101-2021_01_02_425006 18 11 transcribed transcribe VBN 10_1101-2021_01_02_425006 18 12 under under IN 10_1101-2021_01_02_425006 18 13 different different JJ 10_1101-2021_01_02_425006 18 14 triggering trigger VBG 10_1101-2021_01_02_425006 18 15 30 30 CD 10_1101-2021_01_02_425006 18 16 conditions condition NNS 10_1101-2021_01_02_425006 18 17 , , , 10_1101-2021_01_02_425006 18 18 leading lead VBG 10_1101-2021_01_02_425006 18 19 to to IN 10_1101-2021_01_02_425006 18 20 shared share VBN 10_1101-2021_01_02_425006 18 21 genes gene NNS 10_1101-2021_01_02_425006 18 22 among among IN 10_1101-2021_01_02_425006 18 23 different different JJ 10_1101-2021_01_02_425006 18 24 mRNA mRNA NNS 10_1101-2021_01_02_425006 18 25 transcripts transcript NNS 10_1101-2021_01_02_425006 18 26 ( ( -LRB- 10_1101-2021_01_02_425006 18 27 6 6 CD 10_1101-2021_01_02_425006 18 28 - - SYM 10_1101-2021_01_02_425006 18 29 8 8 CD 10_1101-2021_01_02_425006 18 30 ) ) -RRB- 10_1101-2021_01_02_425006 18 31 . . . 10_1101-2021_01_02_425006 19 1 This this DT 10_1101-2021_01_02_425006 19 2 dynamic dynamic JJ 10_1101-2021_01_02_425006 19 3 architecture architecture NN 10_1101-2021_01_02_425006 19 4 31 31 CD 10_1101-2021_01_02_425006 19 5 can can MD 10_1101-2021_01_02_425006 19 6 be be VB 10_1101-2021_01_02_425006 19 7 redefined redefine VBN 10_1101-2021_01_02_425006 19 8 by by IN 10_1101-2021_01_02_425006 19 9 all all DT 10_1101-2021_01_02_425006 19 10 of of IN 10_1101-2021_01_02_425006 19 11 the the DT 10_1101-2021_01_02_425006 19 12 alternative alternative JJ 10_1101-2021_01_02_425006 19 13 transcription transcription NN 10_1101-2021_01_02_425006 19 14 units unit NNS 10_1101-2021_01_02_425006 19 15 ( ( -LRB- 10_1101-2021_01_02_425006 19 16 a.k.a a.k.a JJ 10_1101-2021_01_02_425006 19 17 . . NNP 10_1101-2021_01_02_425006 19 18 , , , 10_1101-2021_01_02_425006 19 19 ATUs ATUs NNPS 10_1101-2021_01_02_425006 19 20 ) ) -RRB- 10_1101-2021_01_02_425006 19 21 ( ( -LRB- 10_1101-2021_01_02_425006 19 22 3 3 CD 10_1101-2021_01_02_425006 19 23 , , , 10_1101-2021_01_02_425006 19 24 5 5 CD 10_1101-2021_01_02_425006 19 25 ) ) -RRB- 10_1101-2021_01_02_425006 19 26 , , , 10_1101-2021_01_02_425006 19 27 and and CC 10_1101-2021_01_02_425006 19 28 more more JJR 10_1101-2021_01_02_425006 19 29 details detail NNS 10_1101-2021_01_02_425006 19 30 can can MD 10_1101-2021_01_02_425006 19 31 be be VB 10_1101-2021_01_02_425006 19 32 32 32 CD 10_1101-2021_01_02_425006 19 33 found find VBN 10_1101-2021_01_02_425006 19 34 in in IN 10_1101-2021_01_02_425006 19 35 fig fig NN 10_1101-2021_01_02_425006 19 36 . . . 10_1101-2021_01_02_425006 20 1 S1 S1 NNS 10_1101-2021_01_02_425006 20 2 . . . 10_1101-2021_01_02_425006 21 1 33 33 CD 10_1101-2021_01_02_425006 21 2 ATU ATU NNP 10_1101-2021_01_02_425006 21 3 identification identification NN 10_1101-2021_01_02_425006 21 4 is be VBZ 10_1101-2021_01_02_425006 21 5 of of IN 10_1101-2021_01_02_425006 21 6 fundamental fundamental JJ 10_1101-2021_01_02_425006 21 7 importance importance NN 10_1101-2021_01_02_425006 21 8 for for IN 10_1101-2021_01_02_425006 21 9 understanding understand VBG 10_1101-2021_01_02_425006 21 10 the the DT 10_1101-2021_01_02_425006 21 11 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 21 12 regulatory regulatory JJ 10_1101-2021_01_02_425006 21 13 34 34 CD 10_1101-2021_01_02_425006 21 14 mechanisms mechanism NNS 10_1101-2021_01_02_425006 21 15 of of IN 10_1101-2021_01_02_425006 21 16 bacteria bacteria NNS 10_1101-2021_01_02_425006 21 17 , , , 10_1101-2021_01_02_425006 21 18 and and CC 10_1101-2021_01_02_425006 21 19 these these DT 10_1101-2021_01_02_425006 21 20 dynamic dynamic JJ 10_1101-2021_01_02_425006 21 21 structures structure NNS 10_1101-2021_01_02_425006 21 22 have have VBP 10_1101-2021_01_02_425006 21 23 been be VBN 10_1101-2021_01_02_425006 21 24 demonstrated demonstrate VBN 10_1101-2021_01_02_425006 21 25 to to TO 10_1101-2021_01_02_425006 21 26 be be VB 10_1101-2021_01_02_425006 21 27 associated associate VBN 10_1101-2021_01_02_425006 21 28 with with IN 10_1101-2021_01_02_425006 21 29 35 35 CD 10_1101-2021_01_02_425006 21 30 human human JJ 10_1101-2021_01_02_425006 21 31 diseases disease NNS 10_1101-2021_01_02_425006 21 32 ( ( -LRB- 10_1101-2021_01_02_425006 21 33 9 9 CD 10_1101-2021_01_02_425006 21 34 - - SYM 10_1101-2021_01_02_425006 21 35 12 12 CD 10_1101-2021_01_02_425006 21 36 ) ) -RRB- 10_1101-2021_01_02_425006 21 37 . . . 10_1101-2021_01_02_425006 22 1 For for IN 10_1101-2021_01_02_425006 22 2 example example NN 10_1101-2021_01_02_425006 22 3 , , , 10_1101-2021_01_02_425006 22 4 Bhat Bhat NNP 10_1101-2021_01_02_425006 22 5 et et NNP 10_1101-2021_01_02_425006 22 6 al al NNP 10_1101-2021_01_02_425006 22 7 . . . 10_1101-2021_01_02_425006 23 1 studied study VBD 10_1101-2021_01_02_425006 23 2 the the DT 10_1101-2021_01_02_425006 23 3 alr alr NNP 10_1101-2021_01_02_425006 23 4 - - HYPH 10_1101-2021_01_02_425006 23 5 groEL1 groel1 NN 10_1101-2021_01_02_425006 23 6 operon operon NNP 10_1101-2021_01_02_425006 23 7 , , , 10_1101-2021_01_02_425006 23 8 which which WDT 10_1101-2021_01_02_425006 23 9 is be VBZ 10_1101-2021_01_02_425006 23 10 essential essential JJ 10_1101-2021_01_02_425006 23 11 for for IN 10_1101-2021_01_02_425006 23 12 the the DT 10_1101-2021_01_02_425006 23 13 36 36 CD 10_1101-2021_01_02_425006 23 14 survival survival NN 10_1101-2021_01_02_425006 23 15 or or CC 10_1101-2021_01_02_425006 23 16 virulence virulence NN 10_1101-2021_01_02_425006 23 17 of of IN 10_1101-2021_01_02_425006 23 18 M. M. NNP 10_1101-2021_01_02_425006 23 19 tuberculosis tuberculosis NN 10_1101-2021_01_02_425006 23 20 ( ( -LRB- 10_1101-2021_01_02_425006 23 21 9 9 CD 10_1101-2021_01_02_425006 23 22 , , , 10_1101-2021_01_02_425006 23 23 11 11 CD 10_1101-2021_01_02_425006 23 24 ) ) -RRB- 10_1101-2021_01_02_425006 23 25 , , , 10_1101-2021_01_02_425006 23 26 the the DT 10_1101-2021_01_02_425006 23 27 causative causative JJ 10_1101-2021_01_02_425006 23 28 agent agent NN 10_1101-2021_01_02_425006 23 29 of of IN 10_1101-2021_01_02_425006 23 30 tuberculosis tuberculosis NN 10_1101-2021_01_02_425006 23 31 ( ( -LRB- 10_1101-2021_01_02_425006 23 32 TB TB NNP 10_1101-2021_01_02_425006 23 33 ) ) -RRB- 10_1101-2021_01_02_425006 23 34 , , , 10_1101-2021_01_02_425006 23 35 and and CC 10_1101-2021_01_02_425006 23 36 found find VBD 10_1101-2021_01_02_425006 23 37 that that IN 10_1101-2021_01_02_425006 23 38 37 37 CD 10_1101-2021_01_02_425006 23 39 .CC .CC , 10_1101-2021_01_02_425006 23 40 - - HYPH 10_1101-2021_01_02_425006 23 41 BY by IN 10_1101-2021_01_02_425006 23 42 - - HYPH 10_1101-2021_01_02_425006 23 43 NC NC NNP 10_1101-2021_01_02_425006 23 44 - - HYPH 10_1101-2021_01_02_425006 23 45 ND ND NNP 10_1101-2021_01_02_425006 23 46 4.0 4.0 CD 10_1101-2021_01_02_425006 23 47 International International NNP 10_1101-2021_01_02_425006 23 48 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 23 49 under under IN 10_1101-2021_01_02_425006 23 50 a a DT 10_1101-2021_01_02_425006 23 51 ( ( -LRB- 10_1101-2021_01_02_425006 23 52 which which WDT 10_1101-2021_01_02_425006 23 53 was be VBD 10_1101-2021_01_02_425006 23 54 not not RB 10_1101-2021_01_02_425006 23 55 certified certify VBN 10_1101-2021_01_02_425006 23 56 by by IN 10_1101-2021_01_02_425006 23 57 peer peer NN 10_1101-2021_01_02_425006 23 58 review review NN 10_1101-2021_01_02_425006 23 59 ) ) -RRB- 10_1101-2021_01_02_425006 23 60 is be VBZ 10_1101-2021_01_02_425006 23 61 the the DT 10_1101-2021_01_02_425006 23 62 author author NN 10_1101-2021_01_02_425006 23 63 / / SYM 10_1101-2021_01_02_425006 23 64 funder funder NN 10_1101-2021_01_02_425006 23 65 , , , 10_1101-2021_01_02_425006 23 66 who who WP 10_1101-2021_01_02_425006 23 67 has have VBZ 10_1101-2021_01_02_425006 23 68 granted grant VBN 10_1101-2021_01_02_425006 23 69 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 23 70 a a DT 10_1101-2021_01_02_425006 23 71 license license NN 10_1101-2021_01_02_425006 23 72 to to TO 10_1101-2021_01_02_425006 23 73 display display VB 10_1101-2021_01_02_425006 23 74 the the DT 10_1101-2021_01_02_425006 23 75 preprint preprint NN 10_1101-2021_01_02_425006 23 76 in in IN 10_1101-2021_01_02_425006 23 77 perpetuity perpetuity NN 10_1101-2021_01_02_425006 23 78 . . . 10_1101-2021_01_02_425006 24 1 It -PRON- PRP 10_1101-2021_01_02_425006 24 2 is be VBZ 10_1101-2021_01_02_425006 24 3 made make VBN 10_1101-2021_01_02_425006 24 4 The the DT 10_1101-2021_01_02_425006 24 5 copyright copyright NN 10_1101-2021_01_02_425006 24 6 holder holder NN 10_1101-2021_01_02_425006 24 7 for for IN 10_1101-2021_01_02_425006 24 8 this this DT 10_1101-2021_01_02_425006 24 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 24 10 version version NN 10_1101-2021_01_02_425006 24 11 posted post VBD 10_1101-2021_01_02_425006 24 12 January January NNP 10_1101-2021_01_02_425006 24 13 6 6 CD 10_1101-2021_01_02_425006 24 14 , , , 10_1101-2021_01_02_425006 24 15 2021 2021 CD 10_1101-2021_01_02_425006 24 16 . . . 10_1101-2021_01_02_425006 24 17 ; ; : 10_1101-2021_01_02_425006 24 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 24 19 : : : 10_1101-2021_01_02_425006 24 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 24 21 preprint preprint NN 10_1101-2021_01_02_425006 24 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 24 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 24 24 3 3 CD 10_1101-2021_01_02_425006 24 25 the the DT 10_1101-2021_01_02_425006 24 26 regulation regulation NN 10_1101-2021_01_02_425006 24 27 of of IN 10_1101-2021_01_02_425006 24 28 the the DT 10_1101-2021_01_02_425006 24 29 sub sub NN 10_1101-2021_01_02_425006 24 30 - - JJ 10_1101-2021_01_02_425006 24 31 operon operon NNP 10_1101-2021_01_02_425006 24 32 is be VBZ 10_1101-2021_01_02_425006 24 33 distinct distinct JJ 10_1101-2021_01_02_425006 24 34 from from IN 10_1101-2021_01_02_425006 24 35 the the DT 10_1101-2021_01_02_425006 24 36 main main JJ 10_1101-2021_01_02_425006 24 37 operon operon NN 10_1101-2021_01_02_425006 24 38 ( ( -LRB- 10_1101-2021_01_02_425006 24 39 alr alr NNP 10_1101-2021_01_02_425006 24 40 - - HYPH 10_1101-2021_01_02_425006 24 41 groEL1 groEL1 NNP 10_1101-2021_01_02_425006 24 42 operon operon NN 10_1101-2021_01_02_425006 24 43 ) ) -RRB- 10_1101-2021_01_02_425006 24 44 under under IN 10_1101-2021_01_02_425006 24 45 stress stress NN 10_1101-2021_01_02_425006 24 46 , , , 10_1101-2021_01_02_425006 24 47 38 38 CD 10_1101-2021_01_02_425006 24 48 especially especially RB 10_1101-2021_01_02_425006 24 49 during during IN 10_1101-2021_01_02_425006 24 50 heat heat NN 10_1101-2021_01_02_425006 24 51 shock shock NN 10_1101-2021_01_02_425006 24 52 , , , 10_1101-2021_01_02_425006 24 53 pH pH NNP 10_1101-2021_01_02_425006 24 54 , , , 10_1101-2021_01_02_425006 24 55 and and CC 10_1101-2021_01_02_425006 24 56 SDS SDS NNP 10_1101-2021_01_02_425006 24 57 stresses stress NNS 10_1101-2021_01_02_425006 24 58 ( ( -LRB- 10_1101-2021_01_02_425006 24 59 9 9 CD 10_1101-2021_01_02_425006 24 60 ) ) -RRB- 10_1101-2021_01_02_425006 24 61 . . . 10_1101-2021_01_02_425006 25 1 Another another DT 10_1101-2021_01_02_425006 25 2 example example NN 10_1101-2021_01_02_425006 25 3 is be VBZ 10_1101-2021_01_02_425006 25 4 Helicobacter Helicobacter NNP 10_1101-2021_01_02_425006 25 5 pylori pylori NN 10_1101-2021_01_02_425006 25 6 , , , 10_1101-2021_01_02_425006 25 7 a a DT 10_1101-2021_01_02_425006 25 8 39 39 CD 10_1101-2021_01_02_425006 25 9 gastric gastric JJ 10_1101-2021_01_02_425006 25 10 pathogen pathogen NN 10_1101-2021_01_02_425006 25 11 that that WDT 10_1101-2021_01_02_425006 25 12 is be VBZ 10_1101-2021_01_02_425006 25 13 the the DT 10_1101-2021_01_02_425006 25 14 primary primary JJ 10_1101-2021_01_02_425006 25 15 known know VBN 10_1101-2021_01_02_425006 25 16 risk risk NN 10_1101-2021_01_02_425006 25 17 factor factor NN 10_1101-2021_01_02_425006 25 18 for for IN 10_1101-2021_01_02_425006 25 19 gastric gastric JJ 10_1101-2021_01_02_425006 25 20 cancer cancer NN 10_1101-2021_01_02_425006 25 21 ( ( -LRB- 10_1101-2021_01_02_425006 25 22 12 12 CD 10_1101-2021_01_02_425006 25 23 ) ) -RRB- 10_1101-2021_01_02_425006 25 24 . . . 10_1101-2021_01_02_425006 26 1 Sharma Sharma NNP 10_1101-2021_01_02_425006 26 2 et et NNP 10_1101-2021_01_02_425006 26 3 al al NNP 10_1101-2021_01_02_425006 26 4 . . . 10_1101-2021_01_02_425006 27 1 found find VBD 10_1101-2021_01_02_425006 27 2 an an DT 10_1101-2021_01_02_425006 27 3 40 40 CD 10_1101-2021_01_02_425006 27 4 acid acid NN 10_1101-2021_01_02_425006 27 5 - - HYPH 10_1101-2021_01_02_425006 27 6 induced induce VBN 10_1101-2021_01_02_425006 27 7 sub sub JJ 10_1101-2021_01_02_425006 27 8 - - HYPH 10_1101-2021_01_02_425006 27 9 operon operon JJ 10_1101-2021_01_02_425006 27 10 cag22 cag22 NNP 10_1101-2021_01_02_425006 27 11 - - SYM 10_1101-2021_01_02_425006 27 12 18 18 CD 10_1101-2021_01_02_425006 27 13 transcribed transcribe VBN 10_1101-2021_01_02_425006 27 14 from from IN 10_1101-2021_01_02_425006 27 15 the the DT 10_1101-2021_01_02_425006 27 16 primary primary JJ 10_1101-2021_01_02_425006 27 17 cag25 cag25 -RRB- 10_1101-2021_01_02_425006 27 18 - - SYM 10_1101-2021_01_02_425006 27 19 18 18 CD 10_1101-2021_01_02_425006 27 20 operon operon NN 10_1101-2021_01_02_425006 27 21 in in IN 10_1101-2021_01_02_425006 27 22 the the DT 10_1101-2021_01_02_425006 27 23 cag cag NNP 10_1101-2021_01_02_425006 27 24 41 41 CD 10_1101-2021_01_02_425006 27 25 pathogenicity pathogenicity NN 10_1101-2021_01_02_425006 27 26 island island NN 10_1101-2021_01_02_425006 27 27 of of IN 10_1101-2021_01_02_425006 27 28 the the DT 10_1101-2021_01_02_425006 27 29 H. H. NNP 10_1101-2021_01_02_425006 27 30 pylori pylori NN 10_1101-2021_01_02_425006 27 31 genome genome NN 10_1101-2021_01_02_425006 27 32 under under IN 10_1101-2021_01_02_425006 27 33 acid acid JJ 10_1101-2021_01_02_425006 27 34 stress stress NN 10_1101-2021_01_02_425006 27 35 ( ( -LRB- 10_1101-2021_01_02_425006 27 36 10 10 CD 10_1101-2021_01_02_425006 27 37 ) ) -RRB- 10_1101-2021_01_02_425006 27 38 . . . 10_1101-2021_01_02_425006 28 1 The the DT 10_1101-2021_01_02_425006 28 2 mechanism mechanism NN 10_1101-2021_01_02_425006 28 3 of of IN 10_1101-2021_01_02_425006 28 4 the the DT 10_1101-2021_01_02_425006 28 5 complex complex JJ 10_1101-2021_01_02_425006 28 6 ATU ATU NNP 10_1101-2021_01_02_425006 28 7 42 42 CD 10_1101-2021_01_02_425006 28 8 structure structure NN 10_1101-2021_01_02_425006 28 9 in in IN 10_1101-2021_01_02_425006 28 10 these these DT 10_1101-2021_01_02_425006 28 11 pathogenic pathogenic JJ 10_1101-2021_01_02_425006 28 12 bacteria bacteria NNS 10_1101-2021_01_02_425006 28 13 can can MD 10_1101-2021_01_02_425006 28 14 help help VB 10_1101-2021_01_02_425006 28 15 us -PRON- PRP 10_1101-2021_01_02_425006 28 16 to to TO 10_1101-2021_01_02_425006 28 17 study study VB 10_1101-2021_01_02_425006 28 18 the the DT 10_1101-2021_01_02_425006 28 19 emergence emergence NN 10_1101-2021_01_02_425006 28 20 of of IN 10_1101-2021_01_02_425006 28 21 human human JJ 10_1101-2021_01_02_425006 28 22 diseases disease NNS 10_1101-2021_01_02_425006 28 23 caused cause VBN 10_1101-2021_01_02_425006 28 24 by by IN 10_1101-2021_01_02_425006 28 25 43 43 CD 10_1101-2021_01_02_425006 28 26 bacterial bacterial JJ 10_1101-2021_01_02_425006 28 27 organisms organism NNS 10_1101-2021_01_02_425006 28 28 . . . 10_1101-2021_01_02_425006 29 1 44 44 CD 10_1101-2021_01_02_425006 29 2 Several several JJ 10_1101-2021_01_02_425006 29 3 newly newly RB 10_1101-2021_01_02_425006 29 4 developed develop VBN 10_1101-2021_01_02_425006 29 5 techniques technique NNS 10_1101-2021_01_02_425006 29 6 have have VBP 10_1101-2021_01_02_425006 29 7 provided provide VBN 10_1101-2021_01_02_425006 29 8 a a DT 10_1101-2021_01_02_425006 29 9 comprehensive comprehensive JJ 10_1101-2021_01_02_425006 29 10 view view NN 10_1101-2021_01_02_425006 29 11 of of IN 10_1101-2021_01_02_425006 29 12 the the DT 10_1101-2021_01_02_425006 29 13 E. E. NNP 10_1101-2021_01_02_425006 29 14 coli coli VBZ 10_1101-2021_01_02_425006 29 15 45 45 CD 10_1101-2021_01_02_425006 29 16 transcriptome transcriptome DT 10_1101-2021_01_02_425006 29 17 by by IN 10_1101-2021_01_02_425006 29 18 identifying identify VBG 10_1101-2021_01_02_425006 29 19 full full JJ 10_1101-2021_01_02_425006 29 20 - - HYPH 10_1101-2021_01_02_425006 29 21 length length NN 10_1101-2021_01_02_425006 29 22 primary primary JJ 10_1101-2021_01_02_425006 29 23 transcripts transcript NNS 10_1101-2021_01_02_425006 29 24 ( ( -LRB- 10_1101-2021_01_02_425006 29 25 13 13 CD 10_1101-2021_01_02_425006 29 26 - - SYM 10_1101-2021_01_02_425006 29 27 17 17 CD 10_1101-2021_01_02_425006 29 28 ) ) -RRB- 10_1101-2021_01_02_425006 29 29 . . . 10_1101-2021_01_02_425006 30 1 For for IN 10_1101-2021_01_02_425006 30 2 example example NN 10_1101-2021_01_02_425006 30 3 , , , 10_1101-2021_01_02_425006 30 4 SMRT SMRT NNP 10_1101-2021_01_02_425006 30 5 - - HYPH 10_1101-2021_01_02_425006 30 6 Cappable cappable JJ 10_1101-2021_01_02_425006 30 7 - - HYPH 10_1101-2021_01_02_425006 30 8 seq seq NN 10_1101-2021_01_02_425006 30 9 46 46 CD 10_1101-2021_01_02_425006 30 10 ( ( -LRB- 10_1101-2021_01_02_425006 30 11 6 6 CD 10_1101-2021_01_02_425006 30 12 ) ) -RRB- 10_1101-2021_01_02_425006 30 13 combines combine VBZ 10_1101-2021_01_02_425006 30 14 the the DT 10_1101-2021_01_02_425006 30 15 isolation isolation NN 10_1101-2021_01_02_425006 30 16 of of IN 10_1101-2021_01_02_425006 30 17 the the DT 10_1101-2021_01_02_425006 30 18 full full JJ 10_1101-2021_01_02_425006 30 19 - - HYPH 10_1101-2021_01_02_425006 30 20 length length NN 10_1101-2021_01_02_425006 30 21 bacterial bacterial JJ 10_1101-2021_01_02_425006 30 22 primary primary NN 10_1101-2021_01_02_425006 30 23 transcriptome transcriptome VBN 10_1101-2021_01_02_425006 30 24 with with IN 10_1101-2021_01_02_425006 30 25 PacBio PacBio NNP 10_1101-2021_01_02_425006 30 26 SMRT SMRT NNP 10_1101-2021_01_02_425006 30 27 ( ( -LRB- 10_1101-2021_01_02_425006 30 28 Single Single NNP 10_1101-2021_01_02_425006 30 29 47 47 CD 10_1101-2021_01_02_425006 30 30 Molecule Molecule NNP 10_1101-2021_01_02_425006 30 31 , , , 10_1101-2021_01_02_425006 30 32 Real Real NNP 10_1101-2021_01_02_425006 30 33 - - HYPH 10_1101-2021_01_02_425006 30 34 Time time NN 10_1101-2021_01_02_425006 30 35 ) ) -RRB- 10_1101-2021_01_02_425006 30 36 sequencing sequence VBG 10_1101-2021_01_02_425006 30 37 ( ( -LRB- 10_1101-2021_01_02_425006 30 38 6 6 CD 10_1101-2021_01_02_425006 30 39 ) ) -RRB- 10_1101-2021_01_02_425006 30 40 , , , 10_1101-2021_01_02_425006 30 41 and and CC 10_1101-2021_01_02_425006 30 42 simultaneous simultaneous JJ 10_1101-2021_01_02_425006 30 43 5 5 CD 10_1101-2021_01_02_425006 30 44 ’ ' '' 10_1101-2021_01_02_425006 30 45 and and CC 10_1101-2021_01_02_425006 30 46 3 3 CD 10_1101-2021_01_02_425006 30 47 ’ ' '' 10_1101-2021_01_02_425006 30 48 end end NN 10_1101-2021_01_02_425006 30 49 sequencing sequencing NN 10_1101-2021_01_02_425006 30 50 ( ( -LRB- 10_1101-2021_01_02_425006 30 51 SEnd SEnd NNP 10_1101-2021_01_02_425006 30 52 - - HYPH 10_1101-2021_01_02_425006 30 53 seq seq NNP 10_1101-2021_01_02_425006 30 54 ) ) -RRB- 10_1101-2021_01_02_425006 30 55 ( ( -LRB- 10_1101-2021_01_02_425006 30 56 7 7 CD 10_1101-2021_01_02_425006 30 57 ) ) -RRB- 10_1101-2021_01_02_425006 30 58 48 48 CD 10_1101-2021_01_02_425006 30 59 captures capture NNS 10_1101-2021_01_02_425006 30 60 both both DT 10_1101-2021_01_02_425006 30 61 transcription transcription NN 10_1101-2021_01_02_425006 30 62 start start NN 10_1101-2021_01_02_425006 30 63 sites site NNS 10_1101-2021_01_02_425006 30 64 ( ( -LRB- 10_1101-2021_01_02_425006 30 65 TSSs TSSs NNP 10_1101-2021_01_02_425006 30 66 ) ) -RRB- 10_1101-2021_01_02_425006 30 67 and and CC 10_1101-2021_01_02_425006 30 68 transcription transcription NN 10_1101-2021_01_02_425006 30 69 termination termination NN 10_1101-2021_01_02_425006 30 70 sites site NNS 10_1101-2021_01_02_425006 30 71 ( ( -LRB- 10_1101-2021_01_02_425006 30 72 TTSs TTSs NNPS 10_1101-2021_01_02_425006 30 73 ) ) -RRB- 10_1101-2021_01_02_425006 30 74 via via IN 10_1101-2021_01_02_425006 30 75 49 49 CD 10_1101-2021_01_02_425006 30 76 circularization circularization NN 10_1101-2021_01_02_425006 30 77 of of IN 10_1101-2021_01_02_425006 30 78 transcripts transcript NNS 10_1101-2021_01_02_425006 30 79 ( ( -LRB- 10_1101-2021_01_02_425006 30 80 17 17 CD 10_1101-2021_01_02_425006 30 81 ) ) -RRB- 10_1101-2021_01_02_425006 30 82 . . . 10_1101-2021_01_02_425006 31 1 Despite despite IN 10_1101-2021_01_02_425006 31 2 the the DT 10_1101-2021_01_02_425006 31 3 great great JJ 10_1101-2021_01_02_425006 31 4 progress progress NN 10_1101-2021_01_02_425006 31 5 in in IN 10_1101-2021_01_02_425006 31 6 experimental experimental JJ 10_1101-2021_01_02_425006 31 7 techniques technique NNS 10_1101-2021_01_02_425006 31 8 , , , 10_1101-2021_01_02_425006 31 9 there there EX 10_1101-2021_01_02_425006 31 10 are be VBP 10_1101-2021_01_02_425006 31 11 still still RB 10_1101-2021_01_02_425006 31 12 50 50 CD 10_1101-2021_01_02_425006 31 13 some some DT 10_1101-2021_01_02_425006 31 14 deficiencies deficiency NNS 10_1101-2021_01_02_425006 31 15 . . . 10_1101-2021_01_02_425006 32 1 On on IN 10_1101-2021_01_02_425006 32 2 the the DT 10_1101-2021_01_02_425006 32 3 one one CD 10_1101-2021_01_02_425006 32 4 hand hand NN 10_1101-2021_01_02_425006 32 5 , , , 10_1101-2021_01_02_425006 32 6 the the DT 10_1101-2021_01_02_425006 32 7 read read VBN 10_1101-2021_01_02_425006 32 8 depth depth NN 10_1101-2021_01_02_425006 32 9 and and CC 10_1101-2021_01_02_425006 32 10 error error NN 10_1101-2021_01_02_425006 32 11 rate rate NN 10_1101-2021_01_02_425006 32 12 of of IN 10_1101-2021_01_02_425006 32 13 the the DT 10_1101-2021_01_02_425006 32 14 third third JJ 10_1101-2021_01_02_425006 32 15 - - HYPH 10_1101-2021_01_02_425006 32 16 generation generation NN 10_1101-2021_01_02_425006 32 17 sequencing sequence VBG 10_1101-2021_01_02_425006 32 18 51 51 CD 10_1101-2021_01_02_425006 32 19 used use VBN 10_1101-2021_01_02_425006 32 20 in in IN 10_1101-2021_01_02_425006 32 21 SMRT SMRT NNP 10_1101-2021_01_02_425006 32 22 - - HYPH 10_1101-2021_01_02_425006 32 23 Cappable cappable JJ 10_1101-2021_01_02_425006 32 24 - - HYPH 10_1101-2021_01_02_425006 32 25 seq seq NN 10_1101-2021_01_02_425006 32 26 have have VBP 10_1101-2021_01_02_425006 32 27 an an DT 10_1101-2021_01_02_425006 32 28 impact impact NN 10_1101-2021_01_02_425006 32 29 on on IN 10_1101-2021_01_02_425006 32 30 ATU ATU NNP 10_1101-2021_01_02_425006 32 31 prediction prediction NN 10_1101-2021_01_02_425006 32 32 compared compare VBN 10_1101-2021_01_02_425006 32 33 with with IN 10_1101-2021_01_02_425006 32 34 Illumina Illumina NNP 10_1101-2021_01_02_425006 32 35 - - HYPH 10_1101-2021_01_02_425006 32 36 based base VBN 10_1101-2021_01_02_425006 32 37 RNA-52 rna-52 CD 10_1101-2021_01_02_425006 32 38 Seq Seq NNP 10_1101-2021_01_02_425006 32 39 ( ( -LRB- 10_1101-2021_01_02_425006 32 40 7 7 CD 10_1101-2021_01_02_425006 32 41 , , , 10_1101-2021_01_02_425006 32 42 18 18 CD 10_1101-2021_01_02_425006 32 43 ) ) -RRB- 10_1101-2021_01_02_425006 32 44 . . . 10_1101-2021_01_02_425006 33 1 On on IN 10_1101-2021_01_02_425006 33 2 the the DT 10_1101-2021_01_02_425006 33 3 other other JJ 10_1101-2021_01_02_425006 33 4 hand hand NN 10_1101-2021_01_02_425006 33 5 , , , 10_1101-2021_01_02_425006 33 6 the the DT 10_1101-2021_01_02_425006 33 7 time time NN 10_1101-2021_01_02_425006 33 8 - - HYPH 10_1101-2021_01_02_425006 33 9 consuming consume VBG 10_1101-2021_01_02_425006 33 10 , , , 10_1101-2021_01_02_425006 33 11 laborious laborious JJ 10_1101-2021_01_02_425006 33 12 , , , 10_1101-2021_01_02_425006 33 13 and and CC 10_1101-2021_01_02_425006 33 14 costly costly JJ 10_1101-2021_01_02_425006 33 15 properties property NNS 10_1101-2021_01_02_425006 33 16 of of IN 10_1101-2021_01_02_425006 33 17 these these DT 10_1101-2021_01_02_425006 33 18 53 53 CD 10_1101-2021_01_02_425006 33 19 experimental experimental JJ 10_1101-2021_01_02_425006 33 20 techniques technique NNS 10_1101-2021_01_02_425006 33 21 make make VBP 10_1101-2021_01_02_425006 33 22 them -PRON- PRP 10_1101-2021_01_02_425006 33 23 unrealistic unrealistic JJ 10_1101-2021_01_02_425006 33 24 to to TO 10_1101-2021_01_02_425006 33 25 be be VB 10_1101-2021_01_02_425006 33 26 generally generally RB 10_1101-2021_01_02_425006 33 27 applicable applicable JJ 10_1101-2021_01_02_425006 33 28 to to IN 10_1101-2021_01_02_425006 33 29 ATU ATU NNP 10_1101-2021_01_02_425006 33 30 predictions prediction NNS 10_1101-2021_01_02_425006 33 31 in in IN 10_1101-2021_01_02_425006 33 32 bacteria bacteria NNS 10_1101-2021_01_02_425006 33 33 54 54 CD 10_1101-2021_01_02_425006 33 34 under under IN 10_1101-2021_01_02_425006 33 35 specific specific JJ 10_1101-2021_01_02_425006 33 36 conditions condition NNS 10_1101-2021_01_02_425006 33 37 . . . 10_1101-2021_01_02_425006 34 1 Thus thus RB 10_1101-2021_01_02_425006 34 2 , , , 10_1101-2021_01_02_425006 34 3 novel novel JJ 10_1101-2021_01_02_425006 34 4 and and CC 10_1101-2021_01_02_425006 34 5 robust robust JJ 10_1101-2021_01_02_425006 34 6 computational computational JJ 10_1101-2021_01_02_425006 34 7 methods method NNS 10_1101-2021_01_02_425006 34 8 for for IN 10_1101-2021_01_02_425006 34 9 ATU ATU NNP 10_1101-2021_01_02_425006 34 10 identification identification NN 10_1101-2021_01_02_425006 34 11 in in IN 10_1101-2021_01_02_425006 34 12 55 55 CD 10_1101-2021_01_02_425006 34 13 bacterial bacterial JJ 10_1101-2021_01_02_425006 34 14 genomes genome NNS 10_1101-2021_01_02_425006 34 15 based base VBN 10_1101-2021_01_02_425006 34 16 on on IN 10_1101-2021_01_02_425006 34 17 RNA RNA NNP 10_1101-2021_01_02_425006 34 18 - - HYPH 10_1101-2021_01_02_425006 34 19 Seq Seq NNP 10_1101-2021_01_02_425006 34 20 are be VBP 10_1101-2021_01_02_425006 34 21 urgently urgently RB 10_1101-2021_01_02_425006 34 22 needed need VBN 10_1101-2021_01_02_425006 34 23 . . . 10_1101-2021_01_02_425006 35 1 56 56 CD 10_1101-2021_01_02_425006 35 2 .CC .CC : 10_1101-2021_01_02_425006 35 3 - - HYPH 10_1101-2021_01_02_425006 35 4 BY by IN 10_1101-2021_01_02_425006 35 5 - - HYPH 10_1101-2021_01_02_425006 35 6 NC NC NNP 10_1101-2021_01_02_425006 35 7 - - HYPH 10_1101-2021_01_02_425006 35 8 ND ND NNP 10_1101-2021_01_02_425006 35 9 4.0 4.0 CD 10_1101-2021_01_02_425006 35 10 International International NNP 10_1101-2021_01_02_425006 35 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 35 12 under under IN 10_1101-2021_01_02_425006 35 13 a a DT 10_1101-2021_01_02_425006 35 14 ( ( -LRB- 10_1101-2021_01_02_425006 35 15 which which WDT 10_1101-2021_01_02_425006 35 16 was be VBD 10_1101-2021_01_02_425006 35 17 not not RB 10_1101-2021_01_02_425006 35 18 certified certify VBN 10_1101-2021_01_02_425006 35 19 by by IN 10_1101-2021_01_02_425006 35 20 peer peer NN 10_1101-2021_01_02_425006 35 21 review review NN 10_1101-2021_01_02_425006 35 22 ) ) -RRB- 10_1101-2021_01_02_425006 35 23 is be VBZ 10_1101-2021_01_02_425006 35 24 the the DT 10_1101-2021_01_02_425006 35 25 author author NN 10_1101-2021_01_02_425006 35 26 / / SYM 10_1101-2021_01_02_425006 35 27 funder funder NN 10_1101-2021_01_02_425006 35 28 , , , 10_1101-2021_01_02_425006 35 29 who who WP 10_1101-2021_01_02_425006 35 30 has have VBZ 10_1101-2021_01_02_425006 35 31 granted grant VBN 10_1101-2021_01_02_425006 35 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 35 33 a a DT 10_1101-2021_01_02_425006 35 34 license license NN 10_1101-2021_01_02_425006 35 35 to to TO 10_1101-2021_01_02_425006 35 36 display display VB 10_1101-2021_01_02_425006 35 37 the the DT 10_1101-2021_01_02_425006 35 38 preprint preprint NN 10_1101-2021_01_02_425006 35 39 in in IN 10_1101-2021_01_02_425006 35 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 35 41 . . . 10_1101-2021_01_02_425006 36 1 It -PRON- PRP 10_1101-2021_01_02_425006 36 2 is be VBZ 10_1101-2021_01_02_425006 36 3 made make VBN 10_1101-2021_01_02_425006 36 4 The the DT 10_1101-2021_01_02_425006 36 5 copyright copyright NN 10_1101-2021_01_02_425006 36 6 holder holder NN 10_1101-2021_01_02_425006 36 7 for for IN 10_1101-2021_01_02_425006 36 8 this this DT 10_1101-2021_01_02_425006 36 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 36 10 version version NN 10_1101-2021_01_02_425006 36 11 posted post VBD 10_1101-2021_01_02_425006 36 12 January January NNP 10_1101-2021_01_02_425006 36 13 6 6 CD 10_1101-2021_01_02_425006 36 14 , , , 10_1101-2021_01_02_425006 36 15 2021 2021 CD 10_1101-2021_01_02_425006 36 16 . . . 10_1101-2021_01_02_425006 36 17 ; ; : 10_1101-2021_01_02_425006 36 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 36 19 : : : 10_1101-2021_01_02_425006 36 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 36 21 preprint preprint NN 10_1101-2021_01_02_425006 36 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 36 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 36 24 4 4 CD 10_1101-2021_01_02_425006 36 25 Fortunately fortunately RB 10_1101-2021_01_02_425006 36 26 , , , 10_1101-2021_01_02_425006 36 27 many many JJ 10_1101-2021_01_02_425006 36 28 computational computational JJ 10_1101-2021_01_02_425006 36 29 studies study NNS 10_1101-2021_01_02_425006 36 30 have have VBP 10_1101-2021_01_02_425006 36 31 been be VBN 10_1101-2021_01_02_425006 36 32 carried carry VBN 10_1101-2021_01_02_425006 36 33 out out RP 10_1101-2021_01_02_425006 36 34 to to TO 10_1101-2021_01_02_425006 36 35 predict predict VB 10_1101-2021_01_02_425006 36 36 ATUs atu NNS 10_1101-2021_01_02_425006 36 37 in in IN 10_1101-2021_01_02_425006 36 38 bacteria bacteria NNS 10_1101-2021_01_02_425006 36 39 , , , 10_1101-2021_01_02_425006 36 40 which which WDT 10_1101-2021_01_02_425006 36 41 57 57 CD 10_1101-2021_01_02_425006 36 42 have have VBP 10_1101-2021_01_02_425006 36 43 provided provide VBN 10_1101-2021_01_02_425006 36 44 some some DT 10_1101-2021_01_02_425006 36 45 preliminary preliminary JJ 10_1101-2021_01_02_425006 36 46 studies study NNS 10_1101-2021_01_02_425006 36 47 for for IN 10_1101-2021_01_02_425006 36 48 ATU ATU NNP 10_1101-2021_01_02_425006 36 49 prediction prediction NN 10_1101-2021_01_02_425006 36 50 . . . 10_1101-2021_01_02_425006 37 1 Several several JJ 10_1101-2021_01_02_425006 37 2 public public JJ 10_1101-2021_01_02_425006 37 3 databases database NNS 10_1101-2021_01_02_425006 37 4 , , , 10_1101-2021_01_02_425006 37 5 such such JJ 10_1101-2021_01_02_425006 37 6 as as IN 10_1101-2021_01_02_425006 37 7 58 58 CD 10_1101-2021_01_02_425006 37 8 RegulonDB regulondb NN 10_1101-2021_01_02_425006 37 9 ( ( -LRB- 10_1101-2021_01_02_425006 37 10 19 19 CD 10_1101-2021_01_02_425006 37 11 ) ) -RRB- 10_1101-2021_01_02_425006 37 12 , , , 10_1101-2021_01_02_425006 37 13 DBTBS(20 DBTBS(20 NNP 10_1101-2021_01_02_425006 37 14 ) ) -RRB- 10_1101-2021_01_02_425006 37 15 , , , 10_1101-2021_01_02_425006 37 16 MicrobesOnline MicrobesOnline NNP 10_1101-2021_01_02_425006 37 17 ( ( -LRB- 10_1101-2021_01_02_425006 37 18 21 21 CD 10_1101-2021_01_02_425006 37 19 ) ) -RRB- 10_1101-2021_01_02_425006 37 20 , , , 10_1101-2021_01_02_425006 37 21 DOOR DOOR NNP 10_1101-2021_01_02_425006 37 22 ( ( -LRB- 10_1101-2021_01_02_425006 37 23 22 22 CD 10_1101-2021_01_02_425006 37 24 , , , 10_1101-2021_01_02_425006 37 25 23 23 CD 10_1101-2021_01_02_425006 37 26 ) ) -RRB- 10_1101-2021_01_02_425006 37 27 , , , 10_1101-2021_01_02_425006 37 28 OperomeDB OperomeDB NNP 10_1101-2021_01_02_425006 37 29 ( ( -LRB- 10_1101-2021_01_02_425006 37 30 24 24 CD 10_1101-2021_01_02_425006 37 31 ) ) -RRB- 10_1101-2021_01_02_425006 37 32 , , , 10_1101-2021_01_02_425006 37 33 DMINDA DMINDA NNP 10_1101-2021_01_02_425006 37 34 2.0 2.0 CD 10_1101-2021_01_02_425006 37 35 59 59 CD 10_1101-2021_01_02_425006 37 36 ( ( -LRB- 10_1101-2021_01_02_425006 37 37 25 25 CD 10_1101-2021_01_02_425006 37 38 ) ) -RRB- 10_1101-2021_01_02_425006 37 39 , , , 10_1101-2021_01_02_425006 37 40 and and CC 10_1101-2021_01_02_425006 37 41 ProOpDB ProOpDB NNP 10_1101-2021_01_02_425006 37 42 ( ( -LRB- 10_1101-2021_01_02_425006 37 43 26 26 CD 10_1101-2021_01_02_425006 37 44 ) ) -RRB- 10_1101-2021_01_02_425006 37 45 , , , 10_1101-2021_01_02_425006 37 46 provide provide VB 10_1101-2021_01_02_425006 37 47 various various JJ 10_1101-2021_01_02_425006 37 48 levels level NNS 10_1101-2021_01_02_425006 37 49 of of IN 10_1101-2021_01_02_425006 37 50 operon operon NNP 10_1101-2021_01_02_425006 37 51 information information NN 10_1101-2021_01_02_425006 37 52 and and CC 10_1101-2021_01_02_425006 37 53 small small JJ 10_1101-2021_01_02_425006 37 54 amounts amount NNS 10_1101-2021_01_02_425006 37 55 of of IN 10_1101-2021_01_02_425006 37 56 ATU ATU NNP 10_1101-2021_01_02_425006 37 57 60 60 CD 10_1101-2021_01_02_425006 37 58 information information NN 10_1101-2021_01_02_425006 37 59 . . . 10_1101-2021_01_02_425006 38 1 However however RB 10_1101-2021_01_02_425006 38 2 , , , 10_1101-2021_01_02_425006 38 3 these these DT 10_1101-2021_01_02_425006 38 4 databases database NNS 10_1101-2021_01_02_425006 38 5 can can MD 10_1101-2021_01_02_425006 38 6 not not RB 10_1101-2021_01_02_425006 38 7 provide provide VB 10_1101-2021_01_02_425006 38 8 genome genome NN 10_1101-2021_01_02_425006 38 9 - - HYPH 10_1101-2021_01_02_425006 38 10 scale scale NN 10_1101-2021_01_02_425006 38 11 ATU ATU NNP 10_1101-2021_01_02_425006 38 12 information information NN 10_1101-2021_01_02_425006 38 13 under under IN 10_1101-2021_01_02_425006 38 14 specific specific JJ 10_1101-2021_01_02_425006 38 15 61 61 CD 10_1101-2021_01_02_425006 38 16 conditions condition NNS 10_1101-2021_01_02_425006 38 17 . . . 10_1101-2021_01_02_425006 39 1 Some some DT 10_1101-2021_01_02_425006 39 2 computational computational JJ 10_1101-2021_01_02_425006 39 3 studies study NNS 10_1101-2021_01_02_425006 39 4 , , , 10_1101-2021_01_02_425006 39 5 including include VBG 10_1101-2021_01_02_425006 39 6 Rockhopper Rockhopper NNP 10_1101-2021_01_02_425006 39 7 ( ( -LRB- 10_1101-2021_01_02_425006 39 8 27 27 CD 10_1101-2021_01_02_425006 39 9 ) ) -RRB- 10_1101-2021_01_02_425006 39 10 , , , 10_1101-2021_01_02_425006 39 11 SeqTU SeqTU NNP 10_1101-2021_01_02_425006 39 12 ( ( -LRB- 10_1101-2021_01_02_425006 39 13 4 4 CD 10_1101-2021_01_02_425006 39 14 , , , 10_1101-2021_01_02_425006 39 15 28 28 CD 10_1101-2021_01_02_425006 39 16 ) ) -RRB- 10_1101-2021_01_02_425006 39 17 , , , 10_1101-2021_01_02_425006 39 18 BAC-62 BAC-62 NNP 10_1101-2021_01_02_425006 39 19 BROWSER(29 browser(29 ADD 10_1101-2021_01_02_425006 39 20 ) ) -RRB- 10_1101-2021_01_02_425006 39 21 , , , 10_1101-2021_01_02_425006 39 22 rSeqTU rseqtu CD 10_1101-2021_01_02_425006 39 23 ( ( -LRB- 10_1101-2021_01_02_425006 39 24 5 5 CD 10_1101-2021_01_02_425006 39 25 ) ) -RRB- 10_1101-2021_01_02_425006 39 26 , , , 10_1101-2021_01_02_425006 39 27 and and CC 10_1101-2021_01_02_425006 39 28 Operon Operon NNP 10_1101-2021_01_02_425006 39 29 - - HYPH 10_1101-2021_01_02_425006 39 30 mapper mapper NNP 10_1101-2021_01_02_425006 39 31 ( ( -LRB- 10_1101-2021_01_02_425006 39 32 30 30 CD 10_1101-2021_01_02_425006 39 33 ) ) -RRB- 10_1101-2021_01_02_425006 39 34 , , , 10_1101-2021_01_02_425006 39 35 utilize utilize VB 10_1101-2021_01_02_425006 39 36 machine machine NN 10_1101-2021_01_02_425006 39 37 learning learning NN 10_1101-2021_01_02_425006 39 38 and and CC 10_1101-2021_01_02_425006 39 39 model model NN 10_1101-2021_01_02_425006 39 40 integration integration NN 10_1101-2021_01_02_425006 39 41 63 63 CD 10_1101-2021_01_02_425006 39 42 methods method NNS 10_1101-2021_01_02_425006 39 43 based base VBN 10_1101-2021_01_02_425006 39 44 on on IN 10_1101-2021_01_02_425006 39 45 genomic genomic JJ 10_1101-2021_01_02_425006 39 46 information information NN 10_1101-2021_01_02_425006 39 47 and and CC 10_1101-2021_01_02_425006 39 48 gene gene NN 10_1101-2021_01_02_425006 39 49 expression expression NN 10_1101-2021_01_02_425006 39 50 profiles profile NNS 10_1101-2021_01_02_425006 39 51 to to TO 10_1101-2021_01_02_425006 39 52 identify identify VB 10_1101-2021_01_02_425006 39 53 bacterial bacterial JJ 10_1101-2021_01_02_425006 39 54 transcription transcription NN 10_1101-2021_01_02_425006 39 55 64 64 CD 10_1101-2021_01_02_425006 39 56 architecture architecture NN 10_1101-2021_01_02_425006 39 57 . . . 10_1101-2021_01_02_425006 40 1 However however RB 10_1101-2021_01_02_425006 40 2 , , , 10_1101-2021_01_02_425006 40 3 these these DT 10_1101-2021_01_02_425006 40 4 works work NNS 10_1101-2021_01_02_425006 40 5 still still RB 10_1101-2021_01_02_425006 40 6 can can MD 10_1101-2021_01_02_425006 40 7 not not RB 10_1101-2021_01_02_425006 40 8 solve solve VB 10_1101-2021_01_02_425006 40 9 the the DT 10_1101-2021_01_02_425006 40 10 dynamic dynamic JJ 10_1101-2021_01_02_425006 40 11 patterns pattern NNS 10_1101-2021_01_02_425006 40 12 and and CC 10_1101-2021_01_02_425006 40 13 overlapping overlap VBG 10_1101-2021_01_02_425006 40 14 nature nature NN 10_1101-2021_01_02_425006 40 15 of of IN 10_1101-2021_01_02_425006 40 16 65 65 CD 10_1101-2021_01_02_425006 40 17 ATUs atu NNS 10_1101-2021_01_02_425006 40 18 . . . 10_1101-2021_01_02_425006 41 1 66 66 CD 10_1101-2021_01_02_425006 41 2 Here here RB 10_1101-2021_01_02_425006 41 3 , , , 10_1101-2021_01_02_425006 41 4 we -PRON- PRP 10_1101-2021_01_02_425006 41 5 present present VBP 10_1101-2021_01_02_425006 41 6 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 41 7 , , , 10_1101-2021_01_02_425006 41 8 a a DT 10_1101-2021_01_02_425006 41 9 novel novel JJ 10_1101-2021_01_02_425006 41 10 computational computational JJ 10_1101-2021_01_02_425006 41 11 method method NN 10_1101-2021_01_02_425006 41 12 for for IN 10_1101-2021_01_02_425006 41 13 genome genome NN 10_1101-2021_01_02_425006 41 14 - - HYPH 10_1101-2021_01_02_425006 41 15 scale scale NN 10_1101-2021_01_02_425006 41 16 ATU ATU NNP 10_1101-2021_01_02_425006 41 17 prediction prediction NN 10_1101-2021_01_02_425006 41 18 by by IN 10_1101-2021_01_02_425006 41 19 67 67 CD 10_1101-2021_01_02_425006 41 20 analyzing analyze VBG 10_1101-2021_01_02_425006 41 21 next- next- JJ 10_1101-2021_01_02_425006 41 22 and and CC 10_1101-2021_01_02_425006 41 23 third third JJ 10_1101-2021_01_02_425006 41 24 - - HYPH 10_1101-2021_01_02_425006 41 25 generation generation NN 10_1101-2021_01_02_425006 41 26 RNA RNA NNP 10_1101-2021_01_02_425006 41 27 - - HYPH 10_1101-2021_01_02_425006 41 28 Seq seq NN 10_1101-2021_01_02_425006 41 29 data datum NNS 10_1101-2021_01_02_425006 41 30 ( ( -LRB- 10_1101-2021_01_02_425006 41 31 Fig fig NN 10_1101-2021_01_02_425006 41 32 . . . 10_1101-2021_01_02_425006 42 1 1 1 CD 10_1101-2021_01_02_425006 42 2 and and CC 10_1101-2021_01_02_425006 42 3 table table NN 10_1101-2021_01_02_425006 42 4 S1 S1 NNS 10_1101-2021_01_02_425006 42 5 ) ) -RRB- 10_1101-2021_01_02_425006 42 6 . . . 10_1101-2021_01_02_425006 43 1 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 43 2 utilizes utilize VBZ 10_1101-2021_01_02_425006 43 3 a a DT 10_1101-2021_01_02_425006 43 4 convex convex NNP 10_1101-2021_01_02_425006 43 5 68 68 CD 10_1101-2021_01_02_425006 43 6 quadratic quadratic JJ 10_1101-2021_01_02_425006 43 7 programming programming NN 10_1101-2021_01_02_425006 43 8 model model NN 10_1101-2021_01_02_425006 43 9 ( ( -LRB- 10_1101-2021_01_02_425006 43 10 CQP CQP NNP 10_1101-2021_01_02_425006 43 11 ) ) -RRB- 10_1101-2021_01_02_425006 43 12 and and CC 10_1101-2021_01_02_425006 43 13 aims aim VBZ 10_1101-2021_01_02_425006 43 14 to to TO 10_1101-2021_01_02_425006 43 15 provide provide VB 10_1101-2021_01_02_425006 43 16 the the DT 10_1101-2021_01_02_425006 43 17 optimum optimum JJ 10_1101-2021_01_02_425006 43 18 expression expression NN 10_1101-2021_01_02_425006 43 19 combination combination NN 10_1101-2021_01_02_425006 43 20 of of IN 10_1101-2021_01_02_425006 43 21 all all DT 10_1101-2021_01_02_425006 43 22 of of IN 10_1101-2021_01_02_425006 43 23 69 69 CD 10_1101-2021_01_02_425006 43 24 the the DT 10_1101-2021_01_02_425006 43 25 to to TO 10_1101-2021_01_02_425006 43 26 - - HYPH 10_1101-2021_01_02_425006 43 27 be be VB 10_1101-2021_01_02_425006 43 28 - - HYPH 10_1101-2021_01_02_425006 43 29 identified identify VBN 10_1101-2021_01_02_425006 43 30 ATUs atu NNS 10_1101-2021_01_02_425006 43 31 . . . 10_1101-2021_01_02_425006 44 1 Specifically specifically RB 10_1101-2021_01_02_425006 44 2 , , , 10_1101-2021_01_02_425006 44 3 CQP CQP NNP 10_1101-2021_01_02_425006 44 4 minimizes minimize VBZ 10_1101-2021_01_02_425006 44 5 the the DT 10_1101-2021_01_02_425006 44 6 squared squared JJ 10_1101-2021_01_02_425006 44 7 error error NN 10_1101-2021_01_02_425006 44 8 between between IN 10_1101-2021_01_02_425006 44 9 the the DT 10_1101-2021_01_02_425006 44 10 predicted predict VBN 10_1101-2021_01_02_425006 44 11 70 70 CD 10_1101-2021_01_02_425006 44 12 expression expression NN 10_1101-2021_01_02_425006 44 13 level level NN 10_1101-2021_01_02_425006 44 14 of of IN 10_1101-2021_01_02_425006 44 15 ATUs atu NNS 10_1101-2021_01_02_425006 44 16 and and CC 10_1101-2021_01_02_425006 44 17 the the DT 10_1101-2021_01_02_425006 44 18 actual actual JJ 10_1101-2021_01_02_425006 44 19 expression expression NN 10_1101-2021_01_02_425006 44 20 levels level NNS 10_1101-2021_01_02_425006 44 21 in in IN 10_1101-2021_01_02_425006 44 22 genetic genetic JJ 10_1101-2021_01_02_425006 44 23 and and CC 10_1101-2021_01_02_425006 44 24 intergenic intergenic JJ 10_1101-2021_01_02_425006 44 25 regions region NNS 10_1101-2021_01_02_425006 44 26 . . . 10_1101-2021_01_02_425006 45 1 It -PRON- PRP 10_1101-2021_01_02_425006 45 2 is be VBZ 10_1101-2021_01_02_425006 45 3 71 71 CD 10_1101-2021_01_02_425006 45 4 noteworthy noteworthy JJ 10_1101-2021_01_02_425006 45 5 that that IN 10_1101-2021_01_02_425006 45 6 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 45 7 also also RB 10_1101-2021_01_02_425006 45 8 utilizes utilize VBZ 10_1101-2021_01_02_425006 45 9 the the DT 10_1101-2021_01_02_425006 45 10 information information NN 10_1101-2021_01_02_425006 45 11 about about IN 10_1101-2021_01_02_425006 45 12 the the DT 10_1101-2021_01_02_425006 45 13 bias bias NN 10_1101-2021_01_02_425006 45 14 rate rate NN 10_1101-2021_01_02_425006 45 15 function function NN 10_1101-2021_01_02_425006 45 16 in in IN 10_1101-2021_01_02_425006 45 17 modeling model VBG 10_1101-2021_01_02_425006 45 18 non-72 non-72 JJ 10_1101-2021_01_02_425006 45 19 uniform uniform NN 10_1101-2021_01_02_425006 45 20 read read VBD 10_1101-2021_01_02_425006 45 21 distribution distribution NN 10_1101-2021_01_02_425006 45 22 as as IN 10_1101-2021_01_02_425006 45 23 the the DT 10_1101-2021_01_02_425006 45 24 linear linear JJ 10_1101-2021_01_02_425006 45 25 constraints constraint NNS 10_1101-2021_01_02_425006 45 26 of of IN 10_1101-2021_01_02_425006 45 27 CQP CQP NNP 10_1101-2021_01_02_425006 45 28 to to TO 10_1101-2021_01_02_425006 45 29 profile profile VB 10_1101-2021_01_02_425006 45 30 the the DT 10_1101-2021_01_02_425006 45 31 complexity complexity NN 10_1101-2021_01_02_425006 45 32 of of IN 10_1101-2021_01_02_425006 45 33 the the DT 10_1101-2021_01_02_425006 45 34 ATU ATU NNP 10_1101-2021_01_02_425006 45 35 73 73 CD 10_1101-2021_01_02_425006 45 36 architecture architecture NN 10_1101-2021_01_02_425006 45 37 . . . 10_1101-2021_01_02_425006 46 1 Overall overall RB 10_1101-2021_01_02_425006 46 2 , , , 10_1101-2021_01_02_425006 46 3 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 46 4 provides provide VBZ 10_1101-2021_01_02_425006 46 5 a a DT 10_1101-2021_01_02_425006 46 6 generalized generalize VBN 10_1101-2021_01_02_425006 46 7 framework framework NN 10_1101-2021_01_02_425006 46 8 for for IN 10_1101-2021_01_02_425006 46 9 the the DT 10_1101-2021_01_02_425006 46 10 inference inference NN 10_1101-2021_01_02_425006 46 11 of of IN 10_1101-2021_01_02_425006 46 12 ATUs ATUs NNPS 10_1101-2021_01_02_425006 46 13 based base VBN 10_1101-2021_01_02_425006 46 14 on on IN 10_1101-2021_01_02_425006 46 15 74 74 CD 10_1101-2021_01_02_425006 46 16 next next JJ 10_1101-2021_01_02_425006 46 17 - - HYPH 10_1101-2021_01_02_425006 46 18 generation generation NN 10_1101-2021_01_02_425006 46 19 RNA RNA NNP 10_1101-2021_01_02_425006 46 20 - - HYPH 10_1101-2021_01_02_425006 46 21 Seq Seq NNP 10_1101-2021_01_02_425006 46 22 data datum NNS 10_1101-2021_01_02_425006 46 23 collected collect VBN 10_1101-2021_01_02_425006 46 24 under under IN 10_1101-2021_01_02_425006 46 25 multiple multiple JJ 10_1101-2021_01_02_425006 46 26 conditions condition NNS 10_1101-2021_01_02_425006 46 27 and and CC 10_1101-2021_01_02_425006 46 28 can can MD 10_1101-2021_01_02_425006 46 29 be be VB 10_1101-2021_01_02_425006 46 30 easily easily RB 10_1101-2021_01_02_425006 46 31 applied apply VBN 10_1101-2021_01_02_425006 46 32 to to IN 10_1101-2021_01_02_425006 46 33 any any DT 10_1101-2021_01_02_425006 46 34 75 75 CD 10_1101-2021_01_02_425006 46 35 .CC .CC , 10_1101-2021_01_02_425006 46 36 - - HYPH 10_1101-2021_01_02_425006 46 37 BY by IN 10_1101-2021_01_02_425006 46 38 - - HYPH 10_1101-2021_01_02_425006 46 39 NC NC NNP 10_1101-2021_01_02_425006 46 40 - - HYPH 10_1101-2021_01_02_425006 46 41 ND ND NNP 10_1101-2021_01_02_425006 46 42 4.0 4.0 CD 10_1101-2021_01_02_425006 46 43 International International NNP 10_1101-2021_01_02_425006 46 44 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 46 45 under under IN 10_1101-2021_01_02_425006 46 46 a a DT 10_1101-2021_01_02_425006 46 47 ( ( -LRB- 10_1101-2021_01_02_425006 46 48 which which WDT 10_1101-2021_01_02_425006 46 49 was be VBD 10_1101-2021_01_02_425006 46 50 not not RB 10_1101-2021_01_02_425006 46 51 certified certify VBN 10_1101-2021_01_02_425006 46 52 by by IN 10_1101-2021_01_02_425006 46 53 peer peer NN 10_1101-2021_01_02_425006 46 54 review review NN 10_1101-2021_01_02_425006 46 55 ) ) -RRB- 10_1101-2021_01_02_425006 46 56 is be VBZ 10_1101-2021_01_02_425006 46 57 the the DT 10_1101-2021_01_02_425006 46 58 author author NN 10_1101-2021_01_02_425006 46 59 / / SYM 10_1101-2021_01_02_425006 46 60 funder funder NN 10_1101-2021_01_02_425006 46 61 , , , 10_1101-2021_01_02_425006 46 62 who who WP 10_1101-2021_01_02_425006 46 63 has have VBZ 10_1101-2021_01_02_425006 46 64 granted grant VBN 10_1101-2021_01_02_425006 46 65 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 46 66 a a DT 10_1101-2021_01_02_425006 46 67 license license NN 10_1101-2021_01_02_425006 46 68 to to TO 10_1101-2021_01_02_425006 46 69 display display VB 10_1101-2021_01_02_425006 46 70 the the DT 10_1101-2021_01_02_425006 46 71 preprint preprint NN 10_1101-2021_01_02_425006 46 72 in in IN 10_1101-2021_01_02_425006 46 73 perpetuity perpetuity NN 10_1101-2021_01_02_425006 46 74 . . . 10_1101-2021_01_02_425006 47 1 It -PRON- PRP 10_1101-2021_01_02_425006 47 2 is be VBZ 10_1101-2021_01_02_425006 47 3 made make VBN 10_1101-2021_01_02_425006 47 4 The the DT 10_1101-2021_01_02_425006 47 5 copyright copyright NN 10_1101-2021_01_02_425006 47 6 holder holder NN 10_1101-2021_01_02_425006 47 7 for for IN 10_1101-2021_01_02_425006 47 8 this this DT 10_1101-2021_01_02_425006 47 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 47 10 version version NN 10_1101-2021_01_02_425006 47 11 posted post VBD 10_1101-2021_01_02_425006 47 12 January January NNP 10_1101-2021_01_02_425006 47 13 6 6 CD 10_1101-2021_01_02_425006 47 14 , , , 10_1101-2021_01_02_425006 47 15 2021 2021 CD 10_1101-2021_01_02_425006 47 16 . . . 10_1101-2021_01_02_425006 47 17 ; ; : 10_1101-2021_01_02_425006 47 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 47 19 : : : 10_1101-2021_01_02_425006 47 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 47 21 preprint preprint NN 10_1101-2021_01_02_425006 47 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 47 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 47 24 5 5 CD 10_1101-2021_01_02_425006 47 25 bacterial bacterial JJ 10_1101-2021_01_02_425006 47 26 organism organism NN 10_1101-2021_01_02_425006 47 27 to to TO 10_1101-2021_01_02_425006 47 28 identify identify VB 10_1101-2021_01_02_425006 47 29 the the DT 10_1101-2021_01_02_425006 47 30 ATU ATU NNP 10_1101-2021_01_02_425006 47 31 architecture architecture NN 10_1101-2021_01_02_425006 47 32 and and CC 10_1101-2021_01_02_425006 47 33 construct construct VB 10_1101-2021_01_02_425006 47 34 a a DT 10_1101-2021_01_02_425006 47 35 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 47 36 regulatory regulatory JJ 10_1101-2021_01_02_425006 47 37 network network NN 10_1101-2021_01_02_425006 47 38 . . . 10_1101-2021_01_02_425006 48 1 76 76 CD 10_1101-2021_01_02_425006 48 2 Please please UH 10_1101-2021_01_02_425006 48 3 place place VB 10_1101-2021_01_02_425006 48 4 Fig Fig NNP 10_1101-2021_01_02_425006 48 5 . . . 10_1101-2021_01_02_425006 49 1 1 1 CD 10_1101-2021_01_02_425006 49 2 here here RB 10_1101-2021_01_02_425006 49 3 . . . 10_1101-2021_01_02_425006 50 1 77 77 CD 10_1101-2021_01_02_425006 50 2 MATERIALS material NNS 10_1101-2021_01_02_425006 50 3 AND and CC 10_1101-2021_01_02_425006 50 4 METHODS METHODS NNP 10_1101-2021_01_02_425006 50 5 78 78 CD 10_1101-2021_01_02_425006 50 6 Data Data NNP 10_1101-2021_01_02_425006 50 7 collection collection NN 10_1101-2021_01_02_425006 50 8 79 79 CD 10_1101-2021_01_02_425006 50 9 The the DT 10_1101-2021_01_02_425006 50 10 two two CD 10_1101-2021_01_02_425006 50 11 Cappable Cappable NNP 10_1101-2021_01_02_425006 50 12 RNA RNA NNP 10_1101-2021_01_02_425006 50 13 - - HYPH 10_1101-2021_01_02_425006 50 14 Seq Seq NNP 10_1101-2021_01_02_425006 50 15 datasets dataset NNS 10_1101-2021_01_02_425006 50 16 used use VBN 10_1101-2021_01_02_425006 50 17 in in IN 10_1101-2021_01_02_425006 50 18 this this DT 10_1101-2021_01_02_425006 50 19 study study NN 10_1101-2021_01_02_425006 50 20 , , , 10_1101-2021_01_02_425006 50 21 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 50 22 and and CC 10_1101-2021_01_02_425006 50 23 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 50 24 , , , 10_1101-2021_01_02_425006 50 25 were be VBD 10_1101-2021_01_02_425006 50 26 80 80 CD 10_1101-2021_01_02_425006 50 27 obtained obtain VBN 10_1101-2021_01_02_425006 50 28 from from IN 10_1101-2021_01_02_425006 50 29 E. E. NNP 10_1101-2021_01_02_425006 50 30 coli coli NNS 10_1101-2021_01_02_425006 50 31 grown grow VBN 10_1101-2021_01_02_425006 50 32 under under IN 10_1101-2021_01_02_425006 50 33 two two CD 10_1101-2021_01_02_425006 50 34 different different JJ 10_1101-2021_01_02_425006 50 35 conditions condition NNS 10_1101-2021_01_02_425006 50 36 : : : 10_1101-2021_01_02_425006 50 37 M9 M9 NNP 10_1101-2021_01_02_425006 50 38 minimal minimal JJ 10_1101-2021_01_02_425006 50 39 medium medium NN 10_1101-2021_01_02_425006 50 40 and and CC 10_1101-2021_01_02_425006 50 41 Rich rich JJ 10_1101-2021_01_02_425006 50 42 medium medium NN 10_1101-2021_01_02_425006 50 43 , , , 10_1101-2021_01_02_425006 50 44 81 81 CD 10_1101-2021_01_02_425006 50 45 respectively respectively RB 10_1101-2021_01_02_425006 50 46 ( ( -LRB- 10_1101-2021_01_02_425006 50 47 6 6 CD 10_1101-2021_01_02_425006 50 48 ) ) -RRB- 10_1101-2021_01_02_425006 50 49 . . . 10_1101-2021_01_02_425006 51 1 The the DT 10_1101-2021_01_02_425006 51 2 full full JJ 10_1101-2021_01_02_425006 51 3 - - HYPH 10_1101-2021_01_02_425006 51 4 length length NN 10_1101-2021_01_02_425006 51 5 primary primary JJ 10_1101-2021_01_02_425006 51 6 transcripts transcript NNS 10_1101-2021_01_02_425006 51 7 were be VBD 10_1101-2021_01_02_425006 51 8 enriched enrich VBN 10_1101-2021_01_02_425006 51 9 as as IN 10_1101-2021_01_02_425006 51 10 described describe VBN 10_1101-2021_01_02_425006 51 11 in in IN 10_1101-2021_01_02_425006 51 12 ( ( -LRB- 10_1101-2021_01_02_425006 51 13 6 6 CD 10_1101-2021_01_02_425006 51 14 ) ) -RRB- 10_1101-2021_01_02_425006 51 15 with with IN 10_1101-2021_01_02_425006 51 16 modifications modification NNS 10_1101-2021_01_02_425006 51 17 82 82 CD 10_1101-2021_01_02_425006 51 18 to to TO 10_1101-2021_01_02_425006 51 19 be be VB 10_1101-2021_01_02_425006 51 20 adapted adapt VBN 10_1101-2021_01_02_425006 51 21 to to IN 10_1101-2021_01_02_425006 51 22 Illumina Illumina NNP 10_1101-2021_01_02_425006 51 23 sequencing sequencing NN 10_1101-2021_01_02_425006 51 24 . . . 10_1101-2021_01_02_425006 52 1 The the DT 10_1101-2021_01_02_425006 52 2 capping cap VBG 10_1101-2021_01_02_425006 52 3 and and CC 10_1101-2021_01_02_425006 52 4 polyA polyA NNP 10_1101-2021_01_02_425006 52 5 tailing tailing NN 10_1101-2021_01_02_425006 52 6 were be VBD 10_1101-2021_01_02_425006 52 7 performed perform VBN 10_1101-2021_01_02_425006 52 8 as as IN 10_1101-2021_01_02_425006 52 9 described describe VBN 10_1101-2021_01_02_425006 52 10 in in IN 10_1101-2021_01_02_425006 52 11 ( ( -LRB- 10_1101-2021_01_02_425006 52 12 6 6 CD 10_1101-2021_01_02_425006 52 13 ) ) -RRB- 10_1101-2021_01_02_425006 52 14 . . . 10_1101-2021_01_02_425006 53 1 83 83 CD 10_1101-2021_01_02_425006 53 2 The the DT 10_1101-2021_01_02_425006 53 3 capped cap VBN 10_1101-2021_01_02_425006 53 4 RNA RNA NNP 10_1101-2021_01_02_425006 53 5 was be VBD 10_1101-2021_01_02_425006 53 6 enriched enrich VBN 10_1101-2021_01_02_425006 53 7 using use VBG 10_1101-2021_01_02_425006 53 8 hydrophilic hydrophilic JJ 10_1101-2021_01_02_425006 53 9 streptavidin streptavidin NNS 10_1101-2021_01_02_425006 53 10 magnetic magnetic JJ 10_1101-2021_01_02_425006 53 11 beads bead NNS 10_1101-2021_01_02_425006 53 12 ( ( -LRB- 10_1101-2021_01_02_425006 53 13 New New NNP 10_1101-2021_01_02_425006 53 14 England England NNP 10_1101-2021_01_02_425006 53 15 Biolabs Biolabs NNP 10_1101-2021_01_02_425006 53 16 ) ) -RRB- 10_1101-2021_01_02_425006 53 17 84 84 CD 10_1101-2021_01_02_425006 53 18 and and CC 10_1101-2021_01_02_425006 53 19 eluted elute VBN 10_1101-2021_01_02_425006 53 20 with with IN 10_1101-2021_01_02_425006 53 21 Biotin Biotin NNP 10_1101-2021_01_02_425006 53 22 using use VBG 10_1101-2021_01_02_425006 53 23 the the DT 10_1101-2021_01_02_425006 53 24 same same JJ 10_1101-2021_01_02_425006 53 25 condition condition NN 10_1101-2021_01_02_425006 53 26 ( ( -LRB- 10_1101-2021_01_02_425006 53 27 6 6 CD 10_1101-2021_01_02_425006 53 28 ) ) -RRB- 10_1101-2021_01_02_425006 53 29 . . . 10_1101-2021_01_02_425006 54 1 Differently differently RB 10_1101-2021_01_02_425006 54 2 , , , 10_1101-2021_01_02_425006 54 3 the the DT 10_1101-2021_01_02_425006 54 4 eluted elute VBN 10_1101-2021_01_02_425006 54 5 RNA RNA NNP 10_1101-2021_01_02_425006 54 6 was be VBD 10_1101-2021_01_02_425006 54 7 enriched enrich VBN 10_1101-2021_01_02_425006 54 8 once once RB 10_1101-2021_01_02_425006 54 9 85 85 CD 10_1101-2021_01_02_425006 54 10 more more RBR 10_1101-2021_01_02_425006 54 11 using use VBG 10_1101-2021_01_02_425006 54 12 streptavidin streptavidin NNP 10_1101-2021_01_02_425006 54 13 beads bead NNS 10_1101-2021_01_02_425006 54 14 to to TO 10_1101-2021_01_02_425006 54 15 further far RBR 10_1101-2021_01_02_425006 54 16 remove remove VB 10_1101-2021_01_02_425006 54 17 processed process VBN 10_1101-2021_01_02_425006 54 18 RNA RNA NNP 10_1101-2021_01_02_425006 54 19 ( ( -LRB- 10_1101-2021_01_02_425006 54 20 e.g. e.g. RB 10_1101-2021_01_02_425006 54 21 , , , 10_1101-2021_01_02_425006 54 22 rRNA rrna CD 10_1101-2021_01_02_425006 54 23 ) ) -RRB- 10_1101-2021_01_02_425006 54 24 . . . 10_1101-2021_01_02_425006 55 1 Subsequently subsequently RB 10_1101-2021_01_02_425006 55 2 , , , 10_1101-2021_01_02_425006 55 3 the the DT 10_1101-2021_01_02_425006 55 4 eluted elute VBN 10_1101-2021_01_02_425006 55 5 86 86 CD 10_1101-2021_01_02_425006 55 6 RNA RNA NNP 10_1101-2021_01_02_425006 55 7 was be VBD 10_1101-2021_01_02_425006 55 8 used use VBN 10_1101-2021_01_02_425006 55 9 for for IN 10_1101-2021_01_02_425006 55 10 library library NN 10_1101-2021_01_02_425006 55 11 preparation preparation NN 10_1101-2021_01_02_425006 55 12 using use VBG 10_1101-2021_01_02_425006 55 13 NEBNext nebnext JJ 10_1101-2021_01_02_425006 55 14 Ultra Ultra NNP 10_1101-2021_01_02_425006 55 15 II II NNP 10_1101-2021_01_02_425006 55 16 directional directional JJ 10_1101-2021_01_02_425006 55 17 RNA RNA NNP 10_1101-2021_01_02_425006 55 18 library library NN 10_1101-2021_01_02_425006 55 19 prep prep NN 10_1101-2021_01_02_425006 55 20 kit kit NNP 10_1101-2021_01_02_425006 55 21 ( ( -LRB- 10_1101-2021_01_02_425006 55 22 E7760 E7760 NNP 10_1101-2021_01_02_425006 55 23 ) ) -RRB- 10_1101-2021_01_02_425006 55 24 . . . 10_1101-2021_01_02_425006 56 1 87 87 CD 10_1101-2021_01_02_425006 56 2 Sequencing Sequencing NNP 10_1101-2021_01_02_425006 56 3 was be VBD 10_1101-2021_01_02_425006 56 4 performed perform VBN 10_1101-2021_01_02_425006 56 5 on on IN 10_1101-2021_01_02_425006 56 6 the the DT 10_1101-2021_01_02_425006 56 7 Illumina Illumina NNP 10_1101-2021_01_02_425006 56 8 Miseq Miseq NNP 10_1101-2021_01_02_425006 56 9 system system NN 10_1101-2021_01_02_425006 56 10 ( ( -LRB- 10_1101-2021_01_02_425006 56 11 paired pair VBN 10_1101-2021_01_02_425006 56 12 - - HYPH 10_1101-2021_01_02_425006 56 13 end end NN 10_1101-2021_01_02_425006 56 14 , , , 10_1101-2021_01_02_425006 56 15 100bp 100bp NNP 10_1101-2021_01_02_425006 56 16 ) ) -RRB- 10_1101-2021_01_02_425006 56 17 . . . 10_1101-2021_01_02_425006 57 1 All all DT 10_1101-2021_01_02_425006 57 2 reads read NNS 10_1101-2021_01_02_425006 57 3 were be VBD 10_1101-2021_01_02_425006 57 4 mapped map VBN 10_1101-2021_01_02_425006 57 5 to to IN 10_1101-2021_01_02_425006 57 6 88 88 CD 10_1101-2021_01_02_425006 57 7 the the DT 10_1101-2021_01_02_425006 57 8 E. E. NNP 10_1101-2021_01_02_425006 57 9 coli coli NNS 10_1101-2021_01_02_425006 57 10 genome genome NN 10_1101-2021_01_02_425006 57 11 using use VBG 10_1101-2021_01_02_425006 57 12 Burrows Burrows NNP 10_1101-2021_01_02_425006 57 13 - - HYPH 10_1101-2021_01_02_425006 57 14 Wheeler Wheeler NNP 10_1101-2021_01_02_425006 57 15 Aligner Aligner NNP 10_1101-2021_01_02_425006 57 16 ( ( -LRB- 10_1101-2021_01_02_425006 57 17 BWA BWA NNP 10_1101-2021_01_02_425006 57 18 ) ) -RRB- 10_1101-2021_01_02_425006 57 19 with with IN 10_1101-2021_01_02_425006 57 20 the the DT 10_1101-2021_01_02_425006 57 21 default default NN 10_1101-2021_01_02_425006 57 22 parameters parameter NNS 10_1101-2021_01_02_425006 57 23 ( ( -LRB- 10_1101-2021_01_02_425006 57 24 31 31 CD 10_1101-2021_01_02_425006 57 25 ) ) -RRB- 10_1101-2021_01_02_425006 57 26 . . . 10_1101-2021_01_02_425006 58 1 Read read VB 10_1101-2021_01_02_425006 58 2 89 89 CD 10_1101-2021_01_02_425006 58 3 alignment alignment NN 10_1101-2021_01_02_425006 58 4 and and CC 10_1101-2021_01_02_425006 58 5 other other JJ 10_1101-2021_01_02_425006 58 6 computational computational JJ 10_1101-2021_01_02_425006 58 7 analyses analysis NNS 10_1101-2021_01_02_425006 58 8 were be VBD 10_1101-2021_01_02_425006 58 9 carried carry VBN 10_1101-2021_01_02_425006 58 10 out out RP 10_1101-2021_01_02_425006 58 11 using use VBG 10_1101-2021_01_02_425006 58 12 the the DT 10_1101-2021_01_02_425006 58 13 E. E. NNP 10_1101-2021_01_02_425006 58 14 coli coli NNS 10_1101-2021_01_02_425006 58 15 genome genome VBP 10_1101-2021_01_02_425006 58 16 NC_000913.3 nc_000913.3 RB 10_1101-2021_01_02_425006 58 17 , , , 10_1101-2021_01_02_425006 58 18 90 90 CD 10_1101-2021_01_02_425006 58 19 and and CC 10_1101-2021_01_02_425006 58 20 the the DT 10_1101-2021_01_02_425006 58 21 corresponding correspond VBG 10_1101-2021_01_02_425006 58 22 gene gene NN 10_1101-2021_01_02_425006 58 23 annotations annotation NNS 10_1101-2021_01_02_425006 58 24 ( ( -LRB- 10_1101-2021_01_02_425006 58 25 GCF_000005845.2_ASM584v2_genomic.gff GCF_000005845.2_ASM584v2_genomic.gff NNP 10_1101-2021_01_02_425006 58 26 ) ) -RRB- 10_1101-2021_01_02_425006 58 27 were be VBD 10_1101-2021_01_02_425006 58 28 91 91 CD 10_1101-2021_01_02_425006 58 29 downloaded download VBN 10_1101-2021_01_02_425006 58 30 from from IN 10_1101-2021_01_02_425006 58 31 NCBI NCBI NNP 10_1101-2021_01_02_425006 58 32 . . . 10_1101-2021_01_02_425006 59 1 Two two CD 10_1101-2021_01_02_425006 59 2 experimentally experimentally RB 10_1101-2021_01_02_425006 59 3 verified verify VBN 10_1101-2021_01_02_425006 59 4 ATU ATU NNP 10_1101-2021_01_02_425006 59 5 datasets dataset NNS 10_1101-2021_01_02_425006 59 6 , , , 10_1101-2021_01_02_425006 59 7 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 59 8 and and CC 10_1101-2021_01_02_425006 59 9 92 92 CD 10_1101-2021_01_02_425006 59 10 SMRT_RiEnrich smrt_rienrich CD 10_1101-2021_01_02_425006 59 11 , , , 10_1101-2021_01_02_425006 59 12 were be VBD 10_1101-2021_01_02_425006 59 13 used use VBN 10_1101-2021_01_02_425006 59 14 as as IN 10_1101-2021_01_02_425006 59 15 the the DT 10_1101-2021_01_02_425006 59 16 benchmark benchmark JJ 10_1101-2021_01_02_425006 59 17 data datum NNS 10_1101-2021_01_02_425006 59 18 to to TO 10_1101-2021_01_02_425006 59 19 evaluate evaluate VB 10_1101-2021_01_02_425006 59 20 the the DT 10_1101-2021_01_02_425006 59 21 predicted predict VBN 10_1101-2021_01_02_425006 59 22 ATUs atu NNS 10_1101-2021_01_02_425006 59 23 , , , 10_1101-2021_01_02_425006 59 24 which which WDT 10_1101-2021_01_02_425006 59 25 were be VBD 10_1101-2021_01_02_425006 59 26 93 93 CD 10_1101-2021_01_02_425006 59 27 .CC .CC , 10_1101-2021_01_02_425006 59 28 - - HYPH 10_1101-2021_01_02_425006 59 29 BY by IN 10_1101-2021_01_02_425006 59 30 - - HYPH 10_1101-2021_01_02_425006 59 31 NC NC NNP 10_1101-2021_01_02_425006 59 32 - - HYPH 10_1101-2021_01_02_425006 59 33 ND ND NNP 10_1101-2021_01_02_425006 59 34 4.0 4.0 CD 10_1101-2021_01_02_425006 59 35 International International NNP 10_1101-2021_01_02_425006 59 36 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 59 37 under under IN 10_1101-2021_01_02_425006 59 38 a a DT 10_1101-2021_01_02_425006 59 39 ( ( -LRB- 10_1101-2021_01_02_425006 59 40 which which WDT 10_1101-2021_01_02_425006 59 41 was be VBD 10_1101-2021_01_02_425006 59 42 not not RB 10_1101-2021_01_02_425006 59 43 certified certify VBN 10_1101-2021_01_02_425006 59 44 by by IN 10_1101-2021_01_02_425006 59 45 peer peer NN 10_1101-2021_01_02_425006 59 46 review review NN 10_1101-2021_01_02_425006 59 47 ) ) -RRB- 10_1101-2021_01_02_425006 59 48 is be VBZ 10_1101-2021_01_02_425006 59 49 the the DT 10_1101-2021_01_02_425006 59 50 author author NN 10_1101-2021_01_02_425006 59 51 / / SYM 10_1101-2021_01_02_425006 59 52 funder funder NN 10_1101-2021_01_02_425006 59 53 , , , 10_1101-2021_01_02_425006 59 54 who who WP 10_1101-2021_01_02_425006 59 55 has have VBZ 10_1101-2021_01_02_425006 59 56 granted grant VBN 10_1101-2021_01_02_425006 59 57 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 59 58 a a DT 10_1101-2021_01_02_425006 59 59 license license NN 10_1101-2021_01_02_425006 59 60 to to TO 10_1101-2021_01_02_425006 59 61 display display VB 10_1101-2021_01_02_425006 59 62 the the DT 10_1101-2021_01_02_425006 59 63 preprint preprint NN 10_1101-2021_01_02_425006 59 64 in in IN 10_1101-2021_01_02_425006 59 65 perpetuity perpetuity NN 10_1101-2021_01_02_425006 59 66 . . . 10_1101-2021_01_02_425006 60 1 It -PRON- PRP 10_1101-2021_01_02_425006 60 2 is be VBZ 10_1101-2021_01_02_425006 60 3 made make VBN 10_1101-2021_01_02_425006 60 4 The the DT 10_1101-2021_01_02_425006 60 5 copyright copyright NN 10_1101-2021_01_02_425006 60 6 holder holder NN 10_1101-2021_01_02_425006 60 7 for for IN 10_1101-2021_01_02_425006 60 8 this this DT 10_1101-2021_01_02_425006 60 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 60 10 version version NN 10_1101-2021_01_02_425006 60 11 posted post VBD 10_1101-2021_01_02_425006 60 12 January January NNP 10_1101-2021_01_02_425006 60 13 6 6 CD 10_1101-2021_01_02_425006 60 14 , , , 10_1101-2021_01_02_425006 60 15 2021 2021 CD 10_1101-2021_01_02_425006 60 16 . . . 10_1101-2021_01_02_425006 60 17 ; ; : 10_1101-2021_01_02_425006 60 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 60 19 : : : 10_1101-2021_01_02_425006 60 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 60 21 preprint preprint NN 10_1101-2021_01_02_425006 60 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 60 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 60 24 6 6 CD 10_1101-2021_01_02_425006 60 25 generated generate VBN 10_1101-2021_01_02_425006 60 26 by by IN 10_1101-2021_01_02_425006 60 27 SMRT SMRT NNP 10_1101-2021_01_02_425006 60 28 - - HYPH 10_1101-2021_01_02_425006 60 29 Cappable cappable JJ 10_1101-2021_01_02_425006 60 30 - - HYPH 10_1101-2021_01_02_425006 60 31 seq seq NN 10_1101-2021_01_02_425006 60 32 under under IN 10_1101-2021_01_02_425006 60 33 the the DT 10_1101-2021_01_02_425006 60 34 same same JJ 10_1101-2021_01_02_425006 60 35 conditions condition NNS 10_1101-2021_01_02_425006 60 36 as as IN 10_1101-2021_01_02_425006 60 37 the the DT 10_1101-2021_01_02_425006 60 38 Illumina Illumina NNP 10_1101-2021_01_02_425006 60 39 datasets dataset VBZ 10_1101-2021_01_02_425006 60 40 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 60 41 94 94 CD 10_1101-2021_01_02_425006 60 42 and and CC 10_1101-2021_01_02_425006 60 43 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 60 44 , , , 10_1101-2021_01_02_425006 60 45 respectively respectively RB 10_1101-2021_01_02_425006 60 46 ( ( -LRB- 10_1101-2021_01_02_425006 60 47 6 6 CD 10_1101-2021_01_02_425006 60 48 ) ) -RRB- 10_1101-2021_01_02_425006 60 49 . . . 10_1101-2021_01_02_425006 61 1 In in IN 10_1101-2021_01_02_425006 61 2 addition addition NN 10_1101-2021_01_02_425006 61 3 , , , 10_1101-2021_01_02_425006 61 4 the the DT 10_1101-2021_01_02_425006 61 5 ATUs atu NNS 10_1101-2021_01_02_425006 61 6 defined define VBN 10_1101-2021_01_02_425006 61 7 by by IN 10_1101-2021_01_02_425006 61 8 RegulonDB regulondb NN 10_1101-2021_01_02_425006 61 9 ( ( -LRB- 10_1101-2021_01_02_425006 61 10 19 19 CD 10_1101-2021_01_02_425006 61 11 ) ) -RRB- 10_1101-2021_01_02_425006 61 12 and and CC 10_1101-2021_01_02_425006 61 13 SEnd SEnd NNP 10_1101-2021_01_02_425006 61 14 - - HYPH 10_1101-2021_01_02_425006 61 15 seq seq NNP 10_1101-2021_01_02_425006 61 16 ( ( -LRB- 10_1101-2021_01_02_425006 61 17 7 7 CD 10_1101-2021_01_02_425006 61 18 ) ) -RRB- 10_1101-2021_01_02_425006 61 19 95 95 CD 10_1101-2021_01_02_425006 61 20 were be VBD 10_1101-2021_01_02_425006 61 21 also also RB 10_1101-2021_01_02_425006 61 22 used use VBN 10_1101-2021_01_02_425006 61 23 as as IN 10_1101-2021_01_02_425006 61 24 additional additional JJ 10_1101-2021_01_02_425006 61 25 evaluation evaluation NN 10_1101-2021_01_02_425006 61 26 data datum NNS 10_1101-2021_01_02_425006 61 27 in in IN 10_1101-2021_01_02_425006 61 28 our -PRON- PRP$ 10_1101-2021_01_02_425006 61 29 study study NN 10_1101-2021_01_02_425006 61 30 . . . 10_1101-2021_01_02_425006 62 1 96 96 CD 10_1101-2021_01_02_425006 62 2 Calculation calculation NN 10_1101-2021_01_02_425006 62 3 of of IN 10_1101-2021_01_02_425006 62 4 the the DT 10_1101-2021_01_02_425006 62 5 expression expression NN 10_1101-2021_01_02_425006 62 6 values value NNS 10_1101-2021_01_02_425006 62 7 of of IN 10_1101-2021_01_02_425006 62 8 genetic genetic JJ 10_1101-2021_01_02_425006 62 9 and and CC 10_1101-2021_01_02_425006 62 10 intergenic intergenic JJ 10_1101-2021_01_02_425006 62 11 regions region NNS 10_1101-2021_01_02_425006 62 12 97 97 CD 10_1101-2021_01_02_425006 62 13 After after IN 10_1101-2021_01_02_425006 62 14 the the DT 10_1101-2021_01_02_425006 62 15 RNA RNA NNP 10_1101-2021_01_02_425006 62 16 - - HYPH 10_1101-2021_01_02_425006 62 17 Seq Seq NNP 10_1101-2021_01_02_425006 62 18 reads read VBZ 10_1101-2021_01_02_425006 62 19 in in IN 10_1101-2021_01_02_425006 62 20 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 62 21 and and CC 10_1101-2021_01_02_425006 62 22 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 62 23 were be VBD 10_1101-2021_01_02_425006 62 24 mapped map VBN 10_1101-2021_01_02_425006 62 25 to to IN 10_1101-2021_01_02_425006 62 26 the the DT 10_1101-2021_01_02_425006 62 27 E. E. NNP 10_1101-2021_01_02_425006 62 28 coli coli NNS 10_1101-2021_01_02_425006 62 29 genome genome NNP 10_1101-2021_01_02_425006 62 30 using use VBG 10_1101-2021_01_02_425006 62 31 98 98 CD 10_1101-2021_01_02_425006 62 32 BWA BWA NNP 10_1101-2021_01_02_425006 62 33 , , , 10_1101-2021_01_02_425006 62 34 we -PRON- PRP 10_1101-2021_01_02_425006 62 35 determined determine VBD 10_1101-2021_01_02_425006 62 36 the the DT 10_1101-2021_01_02_425006 62 37 number number NN 10_1101-2021_01_02_425006 62 38 of of IN 10_1101-2021_01_02_425006 62 39 reads read NNS 10_1101-2021_01_02_425006 62 40 � � . 10_1101-2021_01_02_425006 62 41 ( ( -LRB- 10_1101-2021_01_02_425006 62 42 � � NNP 10_1101-2021_01_02_425006 62 43 ) ) -RRB- 10_1101-2021_01_02_425006 62 44 covering cover VBG 10_1101-2021_01_02_425006 62 45 each each DT 10_1101-2021_01_02_425006 62 46 genomic genomic JJ 10_1101-2021_01_02_425006 62 47 position position NN 10_1101-2021_01_02_425006 62 48 � � NNP 10_1101-2021_01_02_425006 62 49 . . . 10_1101-2021_01_02_425006 63 1 Suppose suppose VB 10_1101-2021_01_02_425006 63 2 that that IN 10_1101-2021_01_02_425006 63 3 � � NNP 10_1101-2021_01_02_425006 63 4 � � NNP 10_1101-2021_01_02_425006 63 5 99 99 CD 10_1101-2021_01_02_425006 63 6 and and CC 10_1101-2021_01_02_425006 63 7 � � NNP 10_1101-2021_01_02_425006 63 8 � � NNP 10_1101-2021_01_02_425006 63 9 � � NNS 10_1101-2021_01_02_425006 63 10 � � NNS 10_1101-2021_01_02_425006 63 11 are be VBP 10_1101-2021_01_02_425006 63 12 two two CD 10_1101-2021_01_02_425006 63 13 consecutive consecutive JJ 10_1101-2021_01_02_425006 63 14 genes gene NNS 10_1101-2021_01_02_425006 63 15 on on IN 10_1101-2021_01_02_425006 63 16 the the DT 10_1101-2021_01_02_425006 63 17 same same JJ 10_1101-2021_01_02_425006 63 18 strand strand NN 10_1101-2021_01_02_425006 63 19 ; ; : 10_1101-2021_01_02_425006 63 20 we -PRON- PRP 10_1101-2021_01_02_425006 63 21 denote denote VBP 10_1101-2021_01_02_425006 63 22 the the DT 10_1101-2021_01_02_425006 63 23 expression expression NN 10_1101-2021_01_02_425006 63 24 value value NN 10_1101-2021_01_02_425006 63 25 of of IN 10_1101-2021_01_02_425006 63 26 � � NNP 10_1101-2021_01_02_425006 63 27 � � NNP 10_1101-2021_01_02_425006 63 28 as as IN 10_1101-2021_01_02_425006 63 29 � � NNP 10_1101-2021_01_02_425006 63 30 � � NNP 10_1101-2021_01_02_425006 63 31 100 100 CD 10_1101-2021_01_02_425006 63 32 and and CC 10_1101-2021_01_02_425006 63 33 the the DT 10_1101-2021_01_02_425006 63 34 expression expression NN 10_1101-2021_01_02_425006 63 35 value value NN 10_1101-2021_01_02_425006 63 36 of of IN 10_1101-2021_01_02_425006 63 37 the the DT 10_1101-2021_01_02_425006 63 38 intergenic intergenic JJ 10_1101-2021_01_02_425006 63 39 region region NN 10_1101-2021_01_02_425006 63 40 between between IN 10_1101-2021_01_02_425006 63 41 genes gene NNS 10_1101-2021_01_02_425006 63 42 � � NNPS 10_1101-2021_01_02_425006 63 43 � � NNP 10_1101-2021_01_02_425006 63 44 and and CC 10_1101-2021_01_02_425006 63 45 � � NNP 10_1101-2021_01_02_425006 63 46 � � NNP 10_1101-2021_01_02_425006 63 47 � � NNS 10_1101-2021_01_02_425006 63 48 � � , 10_1101-2021_01_02_425006 63 49 as as IN 10_1101-2021_01_02_425006 63 50 � � NNP 10_1101-2021_01_02_425006 63 51 � � NNP 10_1101-2021_01_02_425006 63 52 , , , 10_1101-2021_01_02_425006 63 53 � � NNP 10_1101-2021_01_02_425006 63 54 � � NNP 10_1101-2021_01_02_425006 63 55 � � NNP 10_1101-2021_01_02_425006 63 56 . . . 10_1101-2021_01_02_425006 64 1 Then then RB 10_1101-2021_01_02_425006 64 2 , , , 10_1101-2021_01_02_425006 64 3 the the DT 10_1101-2021_01_02_425006 64 4 101 101 CD 10_1101-2021_01_02_425006 64 5 calculation calculation NN 10_1101-2021_01_02_425006 64 6 of of IN 10_1101-2021_01_02_425006 64 7 � � NNP 10_1101-2021_01_02_425006 64 8 � � NNP 10_1101-2021_01_02_425006 64 9 and and CC 10_1101-2021_01_02_425006 64 10 � � NNP 10_1101-2021_01_02_425006 64 11 � � NNP 10_1101-2021_01_02_425006 64 12 , , , 10_1101-2021_01_02_425006 64 13 � � NNP 10_1101-2021_01_02_425006 64 14 � � NNP 10_1101-2021_01_02_425006 64 15 � � NNP 10_1101-2021_01_02_425006 64 16 is be VBZ 10_1101-2021_01_02_425006 64 17 given give VBN 10_1101-2021_01_02_425006 64 18 by by IN 10_1101-2021_01_02_425006 64 19 : : : 10_1101-2021_01_02_425006 64 20 102 102 CD 10_1101-2021_01_02_425006 64 21 � � NNS 10_1101-2021_01_02_425006 64 22 � � ADD 10_1101-2021_01_02_425006 64 23 = = NFP 10_1101-2021_01_02_425006 64 24 ∑ ∑ . 10_1101-2021_01_02_425006 64 25 � � NNP 10_1101-2021_01_02_425006 64 26 ( ( -LRB- 10_1101-2021_01_02_425006 64 27 � � NNP 10_1101-2021_01_02_425006 64 28 ) ) -RRB- 10_1101-2021_01_02_425006 64 29 � � VBZ 10_1101-2021_01_02_425006 64 30 ∈ ∈ JJ 10_1101-2021_01_02_425006 64 31 � � NNS 10_1101-2021_01_02_425006 64 32 � � JJ 10_1101-2021_01_02_425006 64 33 | | NNP 10_1101-2021_01_02_425006 64 34 � � NNP 10_1101-2021_01_02_425006 64 35 � � NNP 10_1101-2021_01_02_425006 64 36 | | NNS 10_1101-2021_01_02_425006 64 37 ( ( -LRB- 10_1101-2021_01_02_425006 64 38 1 1 CD 10_1101-2021_01_02_425006 64 39 ) ) -RRB- 10_1101-2021_01_02_425006 64 40 � � NNP 10_1101-2021_01_02_425006 64 41 � � NNP 10_1101-2021_01_02_425006 64 42 , , , 10_1101-2021_01_02_425006 64 43 � � NNP 10_1101-2021_01_02_425006 64 44 � � NNP 10_1101-2021_01_02_425006 64 45 � � NNP 10_1101-2021_01_02_425006 64 46 = = SYM 10_1101-2021_01_02_425006 64 47 ∑ ∑ . 10_1101-2021_01_02_425006 64 48 � � NNP 10_1101-2021_01_02_425006 64 49 ( ( -LRB- 10_1101-2021_01_02_425006 64 50 � � NNP 10_1101-2021_01_02_425006 64 51 ) ) -RRB- 10_1101-2021_01_02_425006 64 52 � � VBZ 10_1101-2021_01_02_425006 64 53 ∈ ∈ JJ 10_1101-2021_01_02_425006 64 54 � � NNS 10_1101-2021_01_02_425006 64 55 � � NNS 10_1101-2021_01_02_425006 64 56 , , , 10_1101-2021_01_02_425006 64 57 � � NNP 10_1101-2021_01_02_425006 64 58 � � NNP 10_1101-2021_01_02_425006 64 59 � � JJ 10_1101-2021_01_02_425006 64 60 | | NNP 10_1101-2021_01_02_425006 64 61 � � NNP 10_1101-2021_01_02_425006 64 62 � � NNP 10_1101-2021_01_02_425006 64 63 , , , 10_1101-2021_01_02_425006 64 64 � � NNP 10_1101-2021_01_02_425006 64 65 � � NNP 10_1101-2021_01_02_425006 64 66 � � NNS 10_1101-2021_01_02_425006 64 67 | | NN 10_1101-2021_01_02_425006 64 68 ( ( -LRB- 10_1101-2021_01_02_425006 64 69 2 2 CD 10_1101-2021_01_02_425006 64 70 ) ) -RRB- 10_1101-2021_01_02_425006 64 71 where where WRB 10_1101-2021_01_02_425006 64 72 � � . 10_1101-2021_01_02_425006 64 73 ∈ ∈ IN 10_1101-2021_01_02_425006 64 74 � � NNS 10_1101-2021_01_02_425006 64 75 � � NNP 10_1101-2021_01_02_425006 64 76 denotes denote VBZ 10_1101-2021_01_02_425006 64 77 that that IN 10_1101-2021_01_02_425006 64 78 genomic genomic JJ 10_1101-2021_01_02_425006 64 79 position position NN 10_1101-2021_01_02_425006 64 80 � � NNP 10_1101-2021_01_02_425006 64 81 is be VBZ 10_1101-2021_01_02_425006 64 82 on on IN 10_1101-2021_01_02_425006 64 83 the the DT 10_1101-2021_01_02_425006 64 84 gene gene NN 10_1101-2021_01_02_425006 64 85 � � NNP 10_1101-2021_01_02_425006 64 86 � � NNP 10_1101-2021_01_02_425006 64 87 and and CC 10_1101-2021_01_02_425006 64 88 | | NNP 10_1101-2021_01_02_425006 64 89 � � NNP 10_1101-2021_01_02_425006 64 90 � � NNP 10_1101-2021_01_02_425006 64 91 | | NNP 10_1101-2021_01_02_425006 64 92 denotes denote VBZ 10_1101-2021_01_02_425006 64 93 the the DT 10_1101-2021_01_02_425006 64 94 genomic genomic JJ 10_1101-2021_01_02_425006 64 95 103 103 CD 10_1101-2021_01_02_425006 64 96 length length NN 10_1101-2021_01_02_425006 64 97 of of IN 10_1101-2021_01_02_425006 64 98 � � NNP 10_1101-2021_01_02_425006 64 99 � � NNP 10_1101-2021_01_02_425006 64 100 . . . 10_1101-2021_01_02_425006 65 1 104 104 CD 10_1101-2021_01_02_425006 65 2 Modeling model VBG 10_1101-2021_01_02_425006 65 3 non non JJ 10_1101-2021_01_02_425006 65 4 - - JJ 10_1101-2021_01_02_425006 65 5 uniform uniform JJ 10_1101-2021_01_02_425006 65 6 read read VBD 10_1101-2021_01_02_425006 65 7 distribution distribution NN 10_1101-2021_01_02_425006 65 8 along along IN 10_1101-2021_01_02_425006 65 9 mRNA mRNA NNP 10_1101-2021_01_02_425006 65 10 transcripts transcript NNS 10_1101-2021_01_02_425006 65 11 105 105 CD 10_1101-2021_01_02_425006 65 12 We -PRON- PRP 10_1101-2021_01_02_425006 65 13 introduced introduce VBD 10_1101-2021_01_02_425006 65 14 the the DT 10_1101-2021_01_02_425006 65 15 bias bias NN 10_1101-2021_01_02_425006 65 16 rate rate NN 10_1101-2021_01_02_425006 65 17 function function NN 10_1101-2021_01_02_425006 65 18 , , , 10_1101-2021_01_02_425006 65 19 which which WDT 10_1101-2021_01_02_425006 65 20 is be VBZ 10_1101-2021_01_02_425006 65 21 similar similar JJ 10_1101-2021_01_02_425006 65 22 to to IN 10_1101-2021_01_02_425006 65 23 the the DT 10_1101-2021_01_02_425006 65 24 bias bias NN 10_1101-2021_01_02_425006 65 25 curves curve NNS 10_1101-2021_01_02_425006 65 26 in in IN 10_1101-2021_01_02_425006 65 27 the the DT 10_1101-2021_01_02_425006 65 28 work work NN 10_1101-2021_01_02_425006 65 29 of of IN 10_1101-2021_01_02_425006 65 30 Wu Wu NNP 10_1101-2021_01_02_425006 65 31 et et NNP 10_1101-2021_01_02_425006 65 32 al al NNP 10_1101-2021_01_02_425006 65 33 . . . 10_1101-2021_01_02_425006 66 1 ( ( -LRB- 10_1101-2021_01_02_425006 66 2 32 32 CD 10_1101-2021_01_02_425006 66 3 ) ) -RRB- 10_1101-2021_01_02_425006 66 4 , , , 10_1101-2021_01_02_425006 66 5 to to IN 10_1101-2021_01_02_425006 66 6 106 106 CD 10_1101-2021_01_02_425006 66 7 address address NN 10_1101-2021_01_02_425006 66 8 the the DT 10_1101-2021_01_02_425006 66 9 non non JJ 10_1101-2021_01_02_425006 66 10 - - JJ 10_1101-2021_01_02_425006 66 11 uniform uniform JJ 10_1101-2021_01_02_425006 66 12 distribution distribution NN 10_1101-2021_01_02_425006 66 13 of of IN 10_1101-2021_01_02_425006 66 14 the the DT 10_1101-2021_01_02_425006 66 15 RNA RNA NNP 10_1101-2021_01_02_425006 66 16 - - HYPH 10_1101-2021_01_02_425006 66 17 Seq Seq NNP 10_1101-2021_01_02_425006 66 18 reads read VBZ 10_1101-2021_01_02_425006 66 19 along along IN 10_1101-2021_01_02_425006 66 20 mRNA mRNA NNP 10_1101-2021_01_02_425006 66 21 transcripts transcript NNS 10_1101-2021_01_02_425006 66 22 ( ( -LRB- 10_1101-2021_01_02_425006 66 23 32 32 CD 10_1101-2021_01_02_425006 66 24 - - SYM 10_1101-2021_01_02_425006 66 25 35 35 CD 10_1101-2021_01_02_425006 66 26 ) ) -RRB- 10_1101-2021_01_02_425006 66 27 . . . 10_1101-2021_01_02_425006 67 1 The the DT 10_1101-2021_01_02_425006 67 2 bias bias NN 10_1101-2021_01_02_425006 67 3 107 107 CD 10_1101-2021_01_02_425006 67 4 function function NN 10_1101-2021_01_02_425006 67 5 reflects reflect VBZ 10_1101-2021_01_02_425006 67 6 the the DT 10_1101-2021_01_02_425006 67 7 relative relative JJ 10_1101-2021_01_02_425006 67 8 read read VBP 10_1101-2021_01_02_425006 67 9 distribution distribution NN 10_1101-2021_01_02_425006 67 10 bias bias NN 10_1101-2021_01_02_425006 67 11 from from IN 10_1101-2021_01_02_425006 67 12 the the DT 10_1101-2021_01_02_425006 67 13 3 3 CD 10_1101-2021_01_02_425006 67 14 ’ ' '' 10_1101-2021_01_02_425006 67 15 end end NN 10_1101-2021_01_02_425006 67 16 to to IN 10_1101-2021_01_02_425006 67 17 the the DT 10_1101-2021_01_02_425006 67 18 5 5 CD 10_1101-2021_01_02_425006 67 19 ’ ' '' 10_1101-2021_01_02_425006 67 20 end end NN 10_1101-2021_01_02_425006 67 21 of of IN 10_1101-2021_01_02_425006 67 22 an an DT 10_1101-2021_01_02_425006 67 23 mRNA mrna NN 10_1101-2021_01_02_425006 67 24 transcript transcript NN 10_1101-2021_01_02_425006 67 25 . . . 10_1101-2021_01_02_425006 68 1 108 108 CD 10_1101-2021_01_02_425006 68 2 .CC .CC : 10_1101-2021_01_02_425006 68 3 - - HYPH 10_1101-2021_01_02_425006 68 4 BY by IN 10_1101-2021_01_02_425006 68 5 - - HYPH 10_1101-2021_01_02_425006 68 6 NC NC NNP 10_1101-2021_01_02_425006 68 7 - - HYPH 10_1101-2021_01_02_425006 68 8 ND ND NNP 10_1101-2021_01_02_425006 68 9 4.0 4.0 CD 10_1101-2021_01_02_425006 68 10 International International NNP 10_1101-2021_01_02_425006 68 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 68 12 under under IN 10_1101-2021_01_02_425006 68 13 a a DT 10_1101-2021_01_02_425006 68 14 ( ( -LRB- 10_1101-2021_01_02_425006 68 15 which which WDT 10_1101-2021_01_02_425006 68 16 was be VBD 10_1101-2021_01_02_425006 68 17 not not RB 10_1101-2021_01_02_425006 68 18 certified certify VBN 10_1101-2021_01_02_425006 68 19 by by IN 10_1101-2021_01_02_425006 68 20 peer peer NN 10_1101-2021_01_02_425006 68 21 review review NN 10_1101-2021_01_02_425006 68 22 ) ) -RRB- 10_1101-2021_01_02_425006 68 23 is be VBZ 10_1101-2021_01_02_425006 68 24 the the DT 10_1101-2021_01_02_425006 68 25 author author NN 10_1101-2021_01_02_425006 68 26 / / SYM 10_1101-2021_01_02_425006 68 27 funder funder NN 10_1101-2021_01_02_425006 68 28 , , , 10_1101-2021_01_02_425006 68 29 who who WP 10_1101-2021_01_02_425006 68 30 has have VBZ 10_1101-2021_01_02_425006 68 31 granted grant VBN 10_1101-2021_01_02_425006 68 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 68 33 a a DT 10_1101-2021_01_02_425006 68 34 license license NN 10_1101-2021_01_02_425006 68 35 to to TO 10_1101-2021_01_02_425006 68 36 display display VB 10_1101-2021_01_02_425006 68 37 the the DT 10_1101-2021_01_02_425006 68 38 preprint preprint NN 10_1101-2021_01_02_425006 68 39 in in IN 10_1101-2021_01_02_425006 68 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 68 41 . . . 10_1101-2021_01_02_425006 69 1 It -PRON- PRP 10_1101-2021_01_02_425006 69 2 is be VBZ 10_1101-2021_01_02_425006 69 3 made make VBN 10_1101-2021_01_02_425006 69 4 The the DT 10_1101-2021_01_02_425006 69 5 copyright copyright NN 10_1101-2021_01_02_425006 69 6 holder holder NN 10_1101-2021_01_02_425006 69 7 for for IN 10_1101-2021_01_02_425006 69 8 this this DT 10_1101-2021_01_02_425006 69 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 69 10 version version NN 10_1101-2021_01_02_425006 69 11 posted post VBD 10_1101-2021_01_02_425006 69 12 January January NNP 10_1101-2021_01_02_425006 69 13 6 6 CD 10_1101-2021_01_02_425006 69 14 , , , 10_1101-2021_01_02_425006 69 15 2021 2021 CD 10_1101-2021_01_02_425006 69 16 . . . 10_1101-2021_01_02_425006 69 17 ; ; : 10_1101-2021_01_02_425006 69 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 69 19 : : : 10_1101-2021_01_02_425006 69 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 69 21 preprint preprint NN 10_1101-2021_01_02_425006 69 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 69 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 69 24 7 7 CD 10_1101-2021_01_02_425006 69 25 We -PRON- PRP 10_1101-2021_01_02_425006 69 26 assumed assume VBD 10_1101-2021_01_02_425006 69 27 that that IN 10_1101-2021_01_02_425006 69 28 the the DT 10_1101-2021_01_02_425006 69 29 maximum maximum JJ 10_1101-2021_01_02_425006 69 30 read read JJ 10_1101-2021_01_02_425006 69 31 coverage coverage NN 10_1101-2021_01_02_425006 69 32 of of IN 10_1101-2021_01_02_425006 69 33 all all PDT 10_1101-2021_01_02_425006 69 34 the the DT 10_1101-2021_01_02_425006 69 35 genomic genomic JJ 10_1101-2021_01_02_425006 69 36 positions position NNS 10_1101-2021_01_02_425006 69 37 of of IN 10_1101-2021_01_02_425006 69 38 an an DT 10_1101-2021_01_02_425006 69 39 mRNA mrna NN 10_1101-2021_01_02_425006 69 40 transcript transcript NN 10_1101-2021_01_02_425006 69 41 is be VBZ 10_1101-2021_01_02_425006 69 42 the the DT 10_1101-2021_01_02_425006 69 43 109 109 CD 10_1101-2021_01_02_425006 69 44 expression expression NN 10_1101-2021_01_02_425006 69 45 level level NN 10_1101-2021_01_02_425006 69 46 without without IN 10_1101-2021_01_02_425006 69 47 bias bias NN 10_1101-2021_01_02_425006 69 48 . . . 10_1101-2021_01_02_425006 70 1 It -PRON- PRP 10_1101-2021_01_02_425006 70 2 is be VBZ 10_1101-2021_01_02_425006 70 3 noteworthy noteworthy JJ 10_1101-2021_01_02_425006 70 4 that that IN 10_1101-2021_01_02_425006 70 5 a a DT 10_1101-2021_01_02_425006 70 6 single single JJ 10_1101-2021_01_02_425006 70 7 gene gene NN 10_1101-2021_01_02_425006 70 8 mRNA mrna NN 10_1101-2021_01_02_425006 70 9 transcript transcript NN 10_1101-2021_01_02_425006 70 10 with with IN 10_1101-2021_01_02_425006 70 11 no no DT 10_1101-2021_01_02_425006 70 12 shared shared JJ 10_1101-2021_01_02_425006 70 13 gene gene NN 10_1101-2021_01_02_425006 70 14 110 110 CD 10_1101-2021_01_02_425006 70 15 among among IN 10_1101-2021_01_02_425006 70 16 different different JJ 10_1101-2021_01_02_425006 70 17 mRNA mRNA NNS 10_1101-2021_01_02_425006 70 18 transcripts transcript NNS 10_1101-2021_01_02_425006 70 19 can can MD 10_1101-2021_01_02_425006 70 20 serve serve VB 10_1101-2021_01_02_425006 70 21 as as IN 10_1101-2021_01_02_425006 70 22 the the DT 10_1101-2021_01_02_425006 70 23 ideal ideal JJ 10_1101-2021_01_02_425006 70 24 template template NN 10_1101-2021_01_02_425006 70 25 for for IN 10_1101-2021_01_02_425006 70 26 modeling model VBG 10_1101-2021_01_02_425006 70 27 non non JJ 10_1101-2021_01_02_425006 70 28 - - JJ 10_1101-2021_01_02_425006 70 29 uniform uniform JJ 10_1101-2021_01_02_425006 70 30 read read VBD 10_1101-2021_01_02_425006 70 31 111 111 CD 10_1101-2021_01_02_425006 70 32 distribution distribution NN 10_1101-2021_01_02_425006 70 33 along along IN 10_1101-2021_01_02_425006 70 34 mRNA mRNA NNP 10_1101-2021_01_02_425006 70 35 transcripts transcript NNS 10_1101-2021_01_02_425006 70 36 . . . 10_1101-2021_01_02_425006 71 1 The the DT 10_1101-2021_01_02_425006 71 2 specific specific JJ 10_1101-2021_01_02_425006 71 3 steps step NNS 10_1101-2021_01_02_425006 71 4 of of IN 10_1101-2021_01_02_425006 71 5 modeling model VBG 10_1101-2021_01_02_425006 71 6 non non JJ 10_1101-2021_01_02_425006 71 7 - - JJ 10_1101-2021_01_02_425006 71 8 uniform uniform JJ 10_1101-2021_01_02_425006 71 9 read read NN 10_1101-2021_01_02_425006 71 10 distribution distribution NN 10_1101-2021_01_02_425006 71 11 are be VBP 10_1101-2021_01_02_425006 71 12 112 112 CD 10_1101-2021_01_02_425006 71 13 detailed detail VBN 10_1101-2021_01_02_425006 71 14 as as IN 10_1101-2021_01_02_425006 71 15 follows follow VBZ 10_1101-2021_01_02_425006 71 16 : : : 10_1101-2021_01_02_425006 71 17 113 113 CD 10_1101-2021_01_02_425006 71 18 Step step NN 10_1101-2021_01_02_425006 71 19 1 1 CD 10_1101-2021_01_02_425006 71 20 : : : 10_1101-2021_01_02_425006 71 21 Single single JJ 10_1101-2021_01_02_425006 71 22 Gene gene NN 10_1101-2021_01_02_425006 71 23 mRNA mrna NN 10_1101-2021_01_02_425006 71 24 Transcript Transcript NNP 10_1101-2021_01_02_425006 71 25 Selection Selection NNP 10_1101-2021_01_02_425006 71 26 . . . 10_1101-2021_01_02_425006 72 1 We -PRON- PRP 10_1101-2021_01_02_425006 72 2 selected select VBD 10_1101-2021_01_02_425006 72 3 single single JJ 10_1101-2021_01_02_425006 72 4 gene gene NN 10_1101-2021_01_02_425006 72 5 mRNA mrna NN 10_1101-2021_01_02_425006 72 6 transcripts transcript NNS 10_1101-2021_01_02_425006 72 7 from from IN 10_1101-2021_01_02_425006 72 8 the the DT 10_1101-2021_01_02_425006 72 9 114 114 CD 10_1101-2021_01_02_425006 72 10 evaluation evaluation NN 10_1101-2021_01_02_425006 72 11 data datum NNS 10_1101-2021_01_02_425006 72 12 and and CC 10_1101-2021_01_02_425006 72 13 plotted plot VBD 10_1101-2021_01_02_425006 72 14 their -PRON- PRP$ 10_1101-2021_01_02_425006 72 15 expression expression NN 10_1101-2021_01_02_425006 72 16 distributions distribution NNS 10_1101-2021_01_02_425006 72 17 . . . 10_1101-2021_01_02_425006 73 1 Specifically specifically RB 10_1101-2021_01_02_425006 73 2 , , , 10_1101-2021_01_02_425006 73 3 12 12 CD 10_1101-2021_01_02_425006 73 4 groups group NNS 10_1101-2021_01_02_425006 73 5 of of IN 10_1101-2021_01_02_425006 73 6 single single JJ 10_1101-2021_01_02_425006 73 7 gene gene NN 10_1101-2021_01_02_425006 73 8 mRNA mrna CD 10_1101-2021_01_02_425006 73 9 115 115 CD 10_1101-2021_01_02_425006 73 10 transcripts transcript NNS 10_1101-2021_01_02_425006 73 11 with with IN 10_1101-2021_01_02_425006 73 12 lengths length NNS 10_1101-2021_01_02_425006 73 13 ranging range VBG 10_1101-2021_01_02_425006 73 14 from from IN 10_1101-2021_01_02_425006 73 15 300 300 CD 10_1101-2021_01_02_425006 73 16 to to IN 10_1101-2021_01_02_425006 73 17 1,500 1,500 CD 10_1101-2021_01_02_425006 73 18 bp bp NN 10_1101-2021_01_02_425006 73 19 were be VBD 10_1101-2021_01_02_425006 73 20 selected select VBN 10_1101-2021_01_02_425006 73 21 from from IN 10_1101-2021_01_02_425006 73 22 the the DT 10_1101-2021_01_02_425006 73 23 evaluation evaluation NN 10_1101-2021_01_02_425006 73 24 data datum NNS 10_1101-2021_01_02_425006 73 25 ( ( -LRB- 10_1101-2021_01_02_425006 73 26 more more JJR 10_1101-2021_01_02_425006 73 27 116 116 CD 10_1101-2021_01_02_425006 73 28 details detail NNS 10_1101-2021_01_02_425006 73 29 are be VBP 10_1101-2021_01_02_425006 73 30 given give VBN 10_1101-2021_01_02_425006 73 31 in in IN 10_1101-2021_01_02_425006 73 32 method method NN 10_1101-2021_01_02_425006 73 33 S1 S1 NNP 10_1101-2021_01_02_425006 73 34 ) ) -RRB- 10_1101-2021_01_02_425006 73 35 , , , 10_1101-2021_01_02_425006 73 36 and and CC 10_1101-2021_01_02_425006 73 37 each each DT 10_1101-2021_01_02_425006 73 38 group group NN 10_1101-2021_01_02_425006 73 39 had have VBD 10_1101-2021_01_02_425006 73 40 ten ten CD 10_1101-2021_01_02_425006 73 41 randomly randomly RB 10_1101-2021_01_02_425006 73 42 chosen choose VBN 10_1101-2021_01_02_425006 73 43 mRNA mRNA NNS 10_1101-2021_01_02_425006 73 44 transcripts transcript NNS 10_1101-2021_01_02_425006 73 45 . . . 10_1101-2021_01_02_425006 74 1 Apparent apparent JJ 10_1101-2021_01_02_425006 74 2 117 117 CD 10_1101-2021_01_02_425006 74 3 decline decline NN 10_1101-2021_01_02_425006 74 4 trends trend NNS 10_1101-2021_01_02_425006 74 5 appeared appear VBD 10_1101-2021_01_02_425006 74 6 in in IN 10_1101-2021_01_02_425006 74 7 the the DT 10_1101-2021_01_02_425006 74 8 single single JJ 10_1101-2021_01_02_425006 74 9 gene gene NN 10_1101-2021_01_02_425006 74 10 mRNA mrna NN 10_1101-2021_01_02_425006 74 11 transcripts transcript NNS 10_1101-2021_01_02_425006 74 12 with with IN 10_1101-2021_01_02_425006 74 13 long long JJ 10_1101-2021_01_02_425006 74 14 lengths length NNS 10_1101-2021_01_02_425006 74 15 , , , 10_1101-2021_01_02_425006 74 16 ranging range VBG 10_1101-2021_01_02_425006 74 17 from from IN 10_1101-2021_01_02_425006 74 18 1,100 1,100 CD 10_1101-2021_01_02_425006 74 19 to to TO 10_1101-2021_01_02_425006 74 20 118 118 CD 10_1101-2021_01_02_425006 74 21 1,500 1,500 CD 10_1101-2021_01_02_425006 74 22 bp bp NNS 10_1101-2021_01_02_425006 74 23 ( ( -LRB- 10_1101-2021_01_02_425006 74 24 fig fig NN 10_1101-2021_01_02_425006 74 25 . . . 10_1101-2021_01_02_425006 75 1 S2 s2 NN 10_1101-2021_01_02_425006 75 2 ) ) -RRB- 10_1101-2021_01_02_425006 75 3 . . . 10_1101-2021_01_02_425006 76 1 The the DT 10_1101-2021_01_02_425006 76 2 reason reason NN 10_1101-2021_01_02_425006 76 3 for for IN 10_1101-2021_01_02_425006 76 4 this this DT 10_1101-2021_01_02_425006 76 5 phenomenon phenomenon NN 10_1101-2021_01_02_425006 76 6 may may MD 10_1101-2021_01_02_425006 76 7 be be VB 10_1101-2021_01_02_425006 76 8 that that IN 10_1101-2021_01_02_425006 76 9 the the DT 10_1101-2021_01_02_425006 76 10 incomplete incomplete JJ 10_1101-2021_01_02_425006 76 11 transcription transcription NN 10_1101-2021_01_02_425006 76 12 and and CC 10_1101-2021_01_02_425006 76 13 3 3 CD 10_1101-2021_01_02_425006 76 14 ’ ' '' 10_1101-2021_01_02_425006 76 15 end end VB 10_1101-2021_01_02_425006 76 16 119 119 CD 10_1101-2021_01_02_425006 76 17 degradation degradation NN 10_1101-2021_01_02_425006 76 18 or or CC 10_1101-2021_01_02_425006 76 19 processing processing NN 10_1101-2021_01_02_425006 76 20 induce induce VBP 10_1101-2021_01_02_425006 76 21 the the DT 10_1101-2021_01_02_425006 76 22 enrichment enrichment NN 10_1101-2021_01_02_425006 76 23 of of IN 10_1101-2021_01_02_425006 76 24 signal signal NN 10_1101-2021_01_02_425006 76 25 at at IN 10_1101-2021_01_02_425006 76 26 5 5 CD 10_1101-2021_01_02_425006 76 27 ’ ' '' 10_1101-2021_01_02_425006 76 28 end end NN 10_1101-2021_01_02_425006 76 29 of of IN 10_1101-2021_01_02_425006 76 30 the the DT 10_1101-2021_01_02_425006 76 31 mRNA mrna NN 10_1101-2021_01_02_425006 76 32 transcripts transcript NNS 10_1101-2021_01_02_425006 76 33 with with IN 10_1101-2021_01_02_425006 76 34 long long JJ 10_1101-2021_01_02_425006 76 35 120 120 CD 10_1101-2021_01_02_425006 76 36 lengths length NNS 10_1101-2021_01_02_425006 76 37 ( ( -LRB- 10_1101-2021_01_02_425006 76 38 36 36 CD 10_1101-2021_01_02_425006 76 39 , , , 10_1101-2021_01_02_425006 76 40 37 37 CD 10_1101-2021_01_02_425006 76 41 ) ) -RRB- 10_1101-2021_01_02_425006 76 42 . . . 10_1101-2021_01_02_425006 77 1 Finally finally RB 10_1101-2021_01_02_425006 77 2 , , , 10_1101-2021_01_02_425006 77 3 we -PRON- PRP 10_1101-2021_01_02_425006 77 4 plotted plot VBD 10_1101-2021_01_02_425006 77 5 the the DT 10_1101-2021_01_02_425006 77 6 expression expression NN 10_1101-2021_01_02_425006 77 7 distribution distribution NN 10_1101-2021_01_02_425006 77 8 of of IN 10_1101-2021_01_02_425006 77 9 single single JJ 10_1101-2021_01_02_425006 77 10 gene gene NN 10_1101-2021_01_02_425006 77 11 mRNA mrna NN 10_1101-2021_01_02_425006 77 12 transcripts transcript NNS 10_1101-2021_01_02_425006 77 13 with with IN 10_1101-2021_01_02_425006 77 14 121 121 CD 10_1101-2021_01_02_425006 77 15 lengths length NNS 10_1101-2021_01_02_425006 77 16 ranging range VBG 10_1101-2021_01_02_425006 77 17 from from IN 10_1101-2021_01_02_425006 77 18 1,100 1,100 CD 10_1101-2021_01_02_425006 77 19 to to IN 10_1101-2021_01_02_425006 77 20 1,500 1,500 CD 10_1101-2021_01_02_425006 77 21 bp bp NN 10_1101-2021_01_02_425006 77 22 . . . 10_1101-2021_01_02_425006 78 1 122 122 CD 10_1101-2021_01_02_425006 78 2 Step step NN 10_1101-2021_01_02_425006 78 3 2 2 CD 10_1101-2021_01_02_425006 78 4 : : : 10_1101-2021_01_02_425006 78 5 Acquiring acquire VBG 10_1101-2021_01_02_425006 78 6 the the DT 10_1101-2021_01_02_425006 78 7 Bias Bias NNP 10_1101-2021_01_02_425006 78 8 Rate Rate NNP 10_1101-2021_01_02_425006 78 9 Function Function NNP 10_1101-2021_01_02_425006 78 10 . . . 10_1101-2021_01_02_425006 79 1 We -PRON- PRP 10_1101-2021_01_02_425006 79 2 applied apply VBD 10_1101-2021_01_02_425006 79 3 nonlinear nonlinear JJ 10_1101-2021_01_02_425006 79 4 regression regression NN 10_1101-2021_01_02_425006 79 5 to to IN 10_1101-2021_01_02_425006 79 6 the the DT 10_1101-2021_01_02_425006 79 7 expression expression NN 10_1101-2021_01_02_425006 79 8 123 123 CD 10_1101-2021_01_02_425006 79 9 distribution distribution NN 10_1101-2021_01_02_425006 79 10 of of IN 10_1101-2021_01_02_425006 79 11 the the DT 10_1101-2021_01_02_425006 79 12 selected select VBN 10_1101-2021_01_02_425006 79 13 single single JJ 10_1101-2021_01_02_425006 79 14 gene gene NN 10_1101-2021_01_02_425006 79 15 mRNA mrna NN 10_1101-2021_01_02_425006 79 16 transcripts transcript NNS 10_1101-2021_01_02_425006 79 17 and and CC 10_1101-2021_01_02_425006 79 18 acquired acquire VBD 10_1101-2021_01_02_425006 79 19 the the DT 10_1101-2021_01_02_425006 79 20 hypothetical hypothetical JJ 10_1101-2021_01_02_425006 79 21 function function NN 10_1101-2021_01_02_425006 79 22 � � . 10_1101-2021_01_02_425006 79 23 ( ( -LRB- 10_1101-2021_01_02_425006 79 24 � � NNP 10_1101-2021_01_02_425006 79 25 ) ) -RRB- 10_1101-2021_01_02_425006 79 26 . . . 10_1101-2021_01_02_425006 80 1 124 124 CD 10_1101-2021_01_02_425006 80 2 Specifically specifically RB 10_1101-2021_01_02_425006 80 3 , , , 10_1101-2021_01_02_425006 80 4 the the DT 10_1101-2021_01_02_425006 80 5 � � NNP 10_1101-2021_01_02_425006 80 6 axis axis JJ 10_1101-2021_01_02_425006 80 7 and and CC 10_1101-2021_01_02_425006 80 8 � � JJ 10_1101-2021_01_02_425006 80 9 axis axis RB 10_1101-2021_01_02_425006 80 10 of of IN 10_1101-2021_01_02_425006 80 11 the the DT 10_1101-2021_01_02_425006 80 12 expression expression NN 10_1101-2021_01_02_425006 80 13 distribution distribution NN 10_1101-2021_01_02_425006 80 14 were be VBD 10_1101-2021_01_02_425006 80 15 converted convert VBN 10_1101-2021_01_02_425006 80 16 to to IN 10_1101-2021_01_02_425006 80 17 the the DT 10_1101-2021_01_02_425006 80 18 distance distance NN 10_1101-2021_01_02_425006 80 19 from from IN 10_1101-2021_01_02_425006 80 20 125 125 CD 10_1101-2021_01_02_425006 80 21 the the DT 10_1101-2021_01_02_425006 80 22 3 3 CD 10_1101-2021_01_02_425006 80 23 ’ ' '' 10_1101-2021_01_02_425006 80 24 end end NN 10_1101-2021_01_02_425006 80 25 of of IN 10_1101-2021_01_02_425006 80 26 an an DT 10_1101-2021_01_02_425006 80 27 mRNA mrna NN 10_1101-2021_01_02_425006 80 28 transcript transcript NN 10_1101-2021_01_02_425006 80 29 and and CC 10_1101-2021_01_02_425006 80 30 the the DT 10_1101-2021_01_02_425006 80 31 bias bias NN 10_1101-2021_01_02_425006 80 32 rate rate NN 10_1101-2021_01_02_425006 80 33 of of IN 10_1101-2021_01_02_425006 80 34 read read VBN 10_1101-2021_01_02_425006 80 35 distribution distribution NN 10_1101-2021_01_02_425006 80 36 , , , 10_1101-2021_01_02_425006 80 37 respectively respectively RB 10_1101-2021_01_02_425006 80 38 . . . 10_1101-2021_01_02_425006 81 1 To to TO 10_1101-2021_01_02_425006 81 2 apply apply VB 10_1101-2021_01_02_425006 81 3 nonlinear nonlinear JJ 10_1101-2021_01_02_425006 81 4 126 126 CD 10_1101-2021_01_02_425006 81 5 regression regression NN 10_1101-2021_01_02_425006 81 6 to to IN 10_1101-2021_01_02_425006 81 7 single single JJ 10_1101-2021_01_02_425006 81 8 gene gene NN 10_1101-2021_01_02_425006 81 9 mRNA mrna NN 10_1101-2021_01_02_425006 81 10 transcripts transcript NNS 10_1101-2021_01_02_425006 81 11 with with IN 10_1101-2021_01_02_425006 81 12 different different JJ 10_1101-2021_01_02_425006 81 13 lengths length NNS 10_1101-2021_01_02_425006 81 14 , , , 10_1101-2021_01_02_425006 81 15 normalization normalization NN 10_1101-2021_01_02_425006 81 16 was be VBD 10_1101-2021_01_02_425006 81 17 also also RB 10_1101-2021_01_02_425006 81 18 implemented implement VBN 10_1101-2021_01_02_425006 81 19 127 127 CD 10_1101-2021_01_02_425006 81 20 .CC .CC , 10_1101-2021_01_02_425006 81 21 - - HYPH 10_1101-2021_01_02_425006 81 22 BY by IN 10_1101-2021_01_02_425006 81 23 - - HYPH 10_1101-2021_01_02_425006 81 24 NC NC NNP 10_1101-2021_01_02_425006 81 25 - - HYPH 10_1101-2021_01_02_425006 81 26 ND ND NNP 10_1101-2021_01_02_425006 81 27 4.0 4.0 CD 10_1101-2021_01_02_425006 81 28 International International NNP 10_1101-2021_01_02_425006 81 29 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 81 30 under under IN 10_1101-2021_01_02_425006 81 31 a a DT 10_1101-2021_01_02_425006 81 32 ( ( -LRB- 10_1101-2021_01_02_425006 81 33 which which WDT 10_1101-2021_01_02_425006 81 34 was be VBD 10_1101-2021_01_02_425006 81 35 not not RB 10_1101-2021_01_02_425006 81 36 certified certify VBN 10_1101-2021_01_02_425006 81 37 by by IN 10_1101-2021_01_02_425006 81 38 peer peer NN 10_1101-2021_01_02_425006 81 39 review review NN 10_1101-2021_01_02_425006 81 40 ) ) -RRB- 10_1101-2021_01_02_425006 81 41 is be VBZ 10_1101-2021_01_02_425006 81 42 the the DT 10_1101-2021_01_02_425006 81 43 author author NN 10_1101-2021_01_02_425006 81 44 / / SYM 10_1101-2021_01_02_425006 81 45 funder funder NN 10_1101-2021_01_02_425006 81 46 , , , 10_1101-2021_01_02_425006 81 47 who who WP 10_1101-2021_01_02_425006 81 48 has have VBZ 10_1101-2021_01_02_425006 81 49 granted grant VBN 10_1101-2021_01_02_425006 81 50 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 81 51 a a DT 10_1101-2021_01_02_425006 81 52 license license NN 10_1101-2021_01_02_425006 81 53 to to TO 10_1101-2021_01_02_425006 81 54 display display VB 10_1101-2021_01_02_425006 81 55 the the DT 10_1101-2021_01_02_425006 81 56 preprint preprint NN 10_1101-2021_01_02_425006 81 57 in in IN 10_1101-2021_01_02_425006 81 58 perpetuity perpetuity NN 10_1101-2021_01_02_425006 81 59 . . . 10_1101-2021_01_02_425006 82 1 It -PRON- PRP 10_1101-2021_01_02_425006 82 2 is be VBZ 10_1101-2021_01_02_425006 82 3 made make VBN 10_1101-2021_01_02_425006 82 4 The the DT 10_1101-2021_01_02_425006 82 5 copyright copyright NN 10_1101-2021_01_02_425006 82 6 holder holder NN 10_1101-2021_01_02_425006 82 7 for for IN 10_1101-2021_01_02_425006 82 8 this this DT 10_1101-2021_01_02_425006 82 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 82 10 version version NN 10_1101-2021_01_02_425006 82 11 posted post VBD 10_1101-2021_01_02_425006 82 12 January January NNP 10_1101-2021_01_02_425006 82 13 6 6 CD 10_1101-2021_01_02_425006 82 14 , , , 10_1101-2021_01_02_425006 82 15 2021 2021 CD 10_1101-2021_01_02_425006 82 16 . . . 10_1101-2021_01_02_425006 82 17 ; ; : 10_1101-2021_01_02_425006 82 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 82 19 : : : 10_1101-2021_01_02_425006 82 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 82 21 preprint preprint NN 10_1101-2021_01_02_425006 82 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 82 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 82 24 8 8 CD 10_1101-2021_01_02_425006 82 25 on on IN 10_1101-2021_01_02_425006 82 26 � � NNP 10_1101-2021_01_02_425006 82 27 . . . 10_1101-2021_01_02_425006 83 1 Here here RB 10_1101-2021_01_02_425006 83 2 , , , 10_1101-2021_01_02_425006 83 3 � � NNP 10_1101-2021_01_02_425006 83 4 = = NFP 10_1101-2021_01_02_425006 83 5 ( ( -LRB- 10_1101-2021_01_02_425006 83 6 � � NNP 10_1101-2021_01_02_425006 83 7 � � NNP 10_1101-2021_01_02_425006 83 8 , , , 10_1101-2021_01_02_425006 83 9 � � NNP 10_1101-2021_01_02_425006 83 10 � � NNP 10_1101-2021_01_02_425006 83 11 , , , 10_1101-2021_01_02_425006 83 12 … … NFP 10_1101-2021_01_02_425006 83 13 , , , 10_1101-2021_01_02_425006 83 14 � � NNP 10_1101-2021_01_02_425006 83 15 � � NNP 10_1101-2021_01_02_425006 83 16 ) ) -RRB- 10_1101-2021_01_02_425006 83 17 and and CC 10_1101-2021_01_02_425006 83 18 � � NNPS 10_1101-2021_01_02_425006 83 19 = = NFP 10_1101-2021_01_02_425006 83 20 ( ( -LRB- 10_1101-2021_01_02_425006 83 21 � � NNP 10_1101-2021_01_02_425006 83 22 � � NNP 10_1101-2021_01_02_425006 83 23 , , , 10_1101-2021_01_02_425006 83 24 � � NNP 10_1101-2021_01_02_425006 83 25 � � NNP 10_1101-2021_01_02_425006 83 26 , , , 10_1101-2021_01_02_425006 83 27 … … NFP 10_1101-2021_01_02_425006 83 28 , , , 10_1101-2021_01_02_425006 83 29 � � NNP 10_1101-2021_01_02_425006 83 30 � � NNP 10_1101-2021_01_02_425006 83 31 ) ) -RRB- 10_1101-2021_01_02_425006 83 32 are be VBP 10_1101-2021_01_02_425006 83 33 defined define VBN 10_1101-2021_01_02_425006 83 34 by by IN 10_1101-2021_01_02_425006 83 35 : : : 10_1101-2021_01_02_425006 83 36 128 128 CD 10_1101-2021_01_02_425006 83 37 � � NNS 10_1101-2021_01_02_425006 83 38 � � ADD 10_1101-2021_01_02_425006 83 39 = = NFP 10_1101-2021_01_02_425006 83 40 ⎩ ⎩ CD 10_1101-2021_01_02_425006 83 41 ⎨ ⎨ NNP 10_1101-2021_01_02_425006 83 42 ⎧ ⎧ NNP 10_1101-2021_01_02_425006 83 43 � � , 10_1101-2021_01_02_425006 83 44 � � NNS 10_1101-2021_01_02_425006 83 45 − − NNP 10_1101-2021_01_02_425006 83 46 � � VBZ 10_1101-2021_01_02_425006 83 47 � � NN 10_1101-2021_01_02_425006 83 48 � � NNS 10_1101-2021_01_02_425006 83 49 � � VBZ 10_1101-2021_01_02_425006 83 50 � � VBZ 10_1101-2021_01_02_425006 83 51 � � VBZ 10_1101-2021_01_02_425006 83 52 � � VBZ 10_1101-2021_01_02_425006 83 53 � � VBZ 10_1101-2021_01_02_425006 83 54 � � NNS 10_1101-2021_01_02_425006 83 55 � � NNS 10_1101-2021_01_02_425006 83 56 − − NNP 10_1101-2021_01_02_425006 83 57 � � NNP 10_1101-2021_01_02_425006 83 58 � � NNP 10_1101-2021_01_02_425006 83 59 × × CD 10_1101-2021_01_02_425006 83 60 10 10 CD 10_1101-2021_01_02_425006 83 61 � � NNS 10_1101-2021_01_02_425006 83 62 , , , 10_1101-2021_01_02_425006 83 63 � � NNP 10_1101-2021_01_02_425006 83 64 � � ADD 10_1101-2021_01_02_425006 83 65 � � VBZ 10_1101-2021_01_02_425006 83 66 � � VBZ 10_1101-2021_01_02_425006 83 67 � � VBZ 10_1101-2021_01_02_425006 83 68 � � VBZ 10_1101-2021_01_02_425006 83 69 � � VBZ 10_1101-2021_01_02_425006 83 70 � � NNS 10_1101-2021_01_02_425006 83 71 � � NNS 10_1101-2021_01_02_425006 83 72 − − NNP 10_1101-2021_01_02_425006 83 73 � � VBZ 10_1101-2021_01_02_425006 83 74 � � NN 10_1101-2021_01_02_425006 83 75 � � NNS 10_1101-2021_01_02_425006 83 76 � � JJ 10_1101-2021_01_02_425006 83 77 � � NNS 10_1101-2021_01_02_425006 83 78 � � NNS 10_1101-2021_01_02_425006 83 79 − − NNP 10_1101-2021_01_02_425006 83 80 � � NNP 10_1101-2021_01_02_425006 83 81 � � NNP 10_1101-2021_01_02_425006 83 82 × × CD 10_1101-2021_01_02_425006 83 83 10 10 CD 10_1101-2021_01_02_425006 83 84 � � NNS 10_1101-2021_01_02_425006 83 85 , , , 10_1101-2021_01_02_425006 83 86 � � NNP 10_1101-2021_01_02_425006 83 87 � � ADD 10_1101-2021_01_02_425006 83 88 � � VBZ 10_1101-2021_01_02_425006 83 89 � � VBZ 10_1101-2021_01_02_425006 83 90 � � JJ 10_1101-2021_01_02_425006 83 91 � � NNS 10_1101-2021_01_02_425006 83 92 � � NNP 10_1101-2021_01_02_425006 83 93 ( ( -LRB- 10_1101-2021_01_02_425006 83 94 3 3 LS 10_1101-2021_01_02_425006 83 95 ) ) -RRB- 10_1101-2021_01_02_425006 83 96 � � ADD 10_1101-2021_01_02_425006 83 97 � � NNP 10_1101-2021_01_02_425006 83 98 = = NFP 10_1101-2021_01_02_425006 83 99 ⎩ ⎩ NNP 10_1101-2021_01_02_425006 83 100 ⎪ ⎪ NNP 10_1101-2021_01_02_425006 83 101 ⎨ ⎨ NNP 10_1101-2021_01_02_425006 83 102 ⎪ ⎪ NNP 10_1101-2021_01_02_425006 83 103 ⎧ ⎧ XX 10_1101-2021_01_02_425006 83 104 � � NNP 10_1101-2021_01_02_425006 83 105 ( ( -LRB- 10_1101-2021_01_02_425006 83 106 � � NNP 10_1101-2021_01_02_425006 83 107 � � JJ 10_1101-2021_01_02_425006 83 108 � � NNS 10_1101-2021_01_02_425006 83 109 � � JJ 10_1101-2021_01_02_425006 83 110 � � NNS 10_1101-2021_01_02_425006 83 111 � � NN 10_1101-2021_01_02_425006 83 112 ) ) -RRB- 10_1101-2021_01_02_425006 83 113 � � NNP 10_1101-2021_01_02_425006 83 114 � � NNP 10_1101-2021_01_02_425006 83 115 � � NNP 10_1101-2021_01_02_425006 83 116 � � NNP 10_1101-2021_01_02_425006 83 117 , , , 10_1101-2021_01_02_425006 83 118 � � NNP 10_1101-2021_01_02_425006 83 119 � � ADD 10_1101-2021_01_02_425006 83 120 � � VBZ 10_1101-2021_01_02_425006 83 121 � � VBZ 10_1101-2021_01_02_425006 83 122 � � VBZ 10_1101-2021_01_02_425006 83 123 � � JJ 10_1101-2021_01_02_425006 83 124 � � NNS 10_1101-2021_01_02_425006 83 125 � � . 10_1101-2021_01_02_425006 83 126 ( ( -LRB- 10_1101-2021_01_02_425006 83 127 � � NNP 10_1101-2021_01_02_425006 83 128 � � NNP 10_1101-2021_01_02_425006 83 129 ) ) -RRB- 10_1101-2021_01_02_425006 83 130 � � NNP 10_1101-2021_01_02_425006 83 131 � � NNP 10_1101-2021_01_02_425006 83 132 � � NNP 10_1101-2021_01_02_425006 83 133 � � NNP 10_1101-2021_01_02_425006 83 134 , , , 10_1101-2021_01_02_425006 83 135 � � NNP 10_1101-2021_01_02_425006 83 136 � � ADD 10_1101-2021_01_02_425006 83 137 � � VBZ 10_1101-2021_01_02_425006 83 138 � � VBZ 10_1101-2021_01_02_425006 83 139 � � JJ 10_1101-2021_01_02_425006 83 140 � � NNS 10_1101-2021_01_02_425006 83 141 � � NNP 10_1101-2021_01_02_425006 83 142 ( ( -LRB- 10_1101-2021_01_02_425006 83 143 4 4 LS 10_1101-2021_01_02_425006 83 144 ) ) -RRB- 10_1101-2021_01_02_425006 83 145 where where WRB 10_1101-2021_01_02_425006 83 146 � � NNP 10_1101-2021_01_02_425006 83 147 denotes denote VBZ 10_1101-2021_01_02_425006 83 148 the the DT 10_1101-2021_01_02_425006 83 149 number number NN 10_1101-2021_01_02_425006 83 150 of of IN 10_1101-2021_01_02_425006 83 151 genomic genomic JJ 10_1101-2021_01_02_425006 83 152 positions position NNS 10_1101-2021_01_02_425006 83 153 on on IN 10_1101-2021_01_02_425006 83 154 an an DT 10_1101-2021_01_02_425006 83 155 mRNA mrna NN 10_1101-2021_01_02_425006 83 156 transcript transcript NN 10_1101-2021_01_02_425006 83 157 ; ; : 10_1101-2021_01_02_425006 83 158 � � ADD 10_1101-2021_01_02_425006 83 159 = = NFP 10_1101-2021_01_02_425006 83 160 ( ( -LRB- 10_1101-2021_01_02_425006 83 161 � � NNP 10_1101-2021_01_02_425006 83 162 � � NNP 10_1101-2021_01_02_425006 83 163 , , , 10_1101-2021_01_02_425006 83 164 � � NNP 10_1101-2021_01_02_425006 83 165 � � NNP 10_1101-2021_01_02_425006 83 166 , , , 10_1101-2021_01_02_425006 83 167 … … NFP 10_1101-2021_01_02_425006 83 168 , , , 10_1101-2021_01_02_425006 83 169 � � NNP 10_1101-2021_01_02_425006 83 170 � � NNP 10_1101-2021_01_02_425006 83 171 ) ) -RRB- 10_1101-2021_01_02_425006 83 172 denotes denote VBZ 10_1101-2021_01_02_425006 83 173 129 129 CD 10_1101-2021_01_02_425006 83 174 the the DT 10_1101-2021_01_02_425006 83 175 genomic genomic JJ 10_1101-2021_01_02_425006 83 176 positions position NNS 10_1101-2021_01_02_425006 83 177 on on IN 10_1101-2021_01_02_425006 83 178 an an DT 10_1101-2021_01_02_425006 83 179 mRNA mrna NN 10_1101-2021_01_02_425006 83 180 transcript transcript NN 10_1101-2021_01_02_425006 83 181 ; ; : 10_1101-2021_01_02_425006 83 182 � � NNP 10_1101-2021_01_02_425006 83 183 � � , 10_1101-2021_01_02_425006 83 184 � � NNS 10_1101-2021_01_02_425006 83 185 � � NNP 10_1101-2021_01_02_425006 83 186 = = VBZ 10_1101-2021_01_02_425006 83 187 � � NNP 10_1101-2021_01_02_425006 83 188 � � NNP 10_1101-2021_01_02_425006 83 189 ; ; : 10_1101-2021_01_02_425006 83 190 � � NNP 10_1101-2021_01_02_425006 83 191 ( ( -LRB- 10_1101-2021_01_02_425006 83 192 � � NNP 10_1101-2021_01_02_425006 83 193 � � NNP 10_1101-2021_01_02_425006 83 194 ) ) -RRB- 10_1101-2021_01_02_425006 83 195 denotes denote VBZ 10_1101-2021_01_02_425006 83 196 the the DT 10_1101-2021_01_02_425006 83 197 expression expression NN 10_1101-2021_01_02_425006 83 198 level level NN 10_1101-2021_01_02_425006 83 199 of of IN 10_1101-2021_01_02_425006 83 200 the the DT 10_1101-2021_01_02_425006 83 201 130 130 CD 10_1101-2021_01_02_425006 83 202 genomic genomic JJ 10_1101-2021_01_02_425006 83 203 position position NN 10_1101-2021_01_02_425006 83 204 � � NNP 10_1101-2021_01_02_425006 83 205 � � NNP 10_1101-2021_01_02_425006 83 206 , , , 10_1101-2021_01_02_425006 83 207 i.e. i.e. FW 10_1101-2021_01_02_425006 83 208 , , , 10_1101-2021_01_02_425006 83 209 the the DT 10_1101-2021_01_02_425006 83 210 number number NN 10_1101-2021_01_02_425006 83 211 of of IN 10_1101-2021_01_02_425006 83 212 reads read NNS 10_1101-2021_01_02_425006 83 213 covering cover VBG 10_1101-2021_01_02_425006 83 214 the the DT 10_1101-2021_01_02_425006 83 215 genomic genomic JJ 10_1101-2021_01_02_425006 83 216 position position NN 10_1101-2021_01_02_425006 83 217 � � ADD 10_1101-2021_01_02_425006 83 218 � � NNP 10_1101-2021_01_02_425006 83 219 ; ; , 10_1101-2021_01_02_425006 83 220 and and CC 10_1101-2021_01_02_425006 83 221 � � JJ 10_1101-2021_01_02_425006 83 222 � � NNP 10_1101-2021_01_02_425006 83 223 � � NNP 10_1101-2021_01_02_425006 83 224 � � NNP 10_1101-2021_01_02_425006 83 225 denotes denote VBZ 10_1101-2021_01_02_425006 83 226 the the DT 10_1101-2021_01_02_425006 83 227 131 131 CD 10_1101-2021_01_02_425006 83 228 expression expression NN 10_1101-2021_01_02_425006 83 229 level level NN 10_1101-2021_01_02_425006 83 230 without without IN 10_1101-2021_01_02_425006 83 231 bias bias NN 10_1101-2021_01_02_425006 83 232 in in IN 10_1101-2021_01_02_425006 83 233 an an DT 10_1101-2021_01_02_425006 83 234 mRNA mrna NN 10_1101-2021_01_02_425006 83 235 transcript transcript NN 10_1101-2021_01_02_425006 83 236 , , , 10_1101-2021_01_02_425006 83 237 which which WDT 10_1101-2021_01_02_425006 83 238 is be VBZ 10_1101-2021_01_02_425006 83 239 calculated calculate VBN 10_1101-2021_01_02_425006 83 240 as as IN 10_1101-2021_01_02_425006 83 241 � � NNP 10_1101-2021_01_02_425006 83 242 � � ADD 10_1101-2021_01_02_425006 83 243 � � , 10_1101-2021_01_02_425006 83 244 { { -LRB- 10_1101-2021_01_02_425006 83 245 � � NNP 10_1101-2021_01_02_425006 83 246 ( ( -LRB- 10_1101-2021_01_02_425006 83 247 � � NNP 10_1101-2021_01_02_425006 83 248 � � NNP 10_1101-2021_01_02_425006 83 249 ) ) -RRB- 10_1101-2021_01_02_425006 83 250 } } -RRB- 10_1101-2021_01_02_425006 83 251 , , , 10_1101-2021_01_02_425006 83 252 1 1 CD 10_1101-2021_01_02_425006 83 253 ≤ ≤ NN 10_1101-2021_01_02_425006 83 254 � � NNP 10_1101-2021_01_02_425006 83 255 ≤ ≤ NN 10_1101-2021_01_02_425006 83 256 � � NNP 10_1101-2021_01_02_425006 83 257 . . . 10_1101-2021_01_02_425006 84 1 132 132 CD 10_1101-2021_01_02_425006 84 2 We -PRON- PRP 10_1101-2021_01_02_425006 84 3 used use VBD 10_1101-2021_01_02_425006 84 4 the the DT 10_1101-2021_01_02_425006 84 5 function function NN 10_1101-2021_01_02_425006 84 6 nls nls JJ 10_1101-2021_01_02_425006 84 7 in in IN 10_1101-2021_01_02_425006 84 8 R R NNP 10_1101-2021_01_02_425006 84 9 to to TO 10_1101-2021_01_02_425006 84 10 acquire acquire VB 10_1101-2021_01_02_425006 84 11 the the DT 10_1101-2021_01_02_425006 84 12 hypothetical hypothetical JJ 10_1101-2021_01_02_425006 84 13 function function NN 10_1101-2021_01_02_425006 84 14 � � . 10_1101-2021_01_02_425006 84 15 ( ( -LRB- 10_1101-2021_01_02_425006 84 16 � � NNP 10_1101-2021_01_02_425006 84 17 ) ) -RRB- 10_1101-2021_01_02_425006 84 18 . . . 10_1101-2021_01_02_425006 85 1 133 133 CD 10_1101-2021_01_02_425006 85 2 Step step NN 10_1101-2021_01_02_425006 85 3 3 3 CD 10_1101-2021_01_02_425006 85 4 : : : 10_1101-2021_01_02_425006 85 5 Constructing construct VBG 10_1101-2021_01_02_425006 85 6 Bias Bias NNP 10_1101-2021_01_02_425006 85 7 Rate Rate NNP 10_1101-2021_01_02_425006 85 8 Vectors Vectors NNPS 10_1101-2021_01_02_425006 85 9 . . . 10_1101-2021_01_02_425006 86 1 We -PRON- PRP 10_1101-2021_01_02_425006 86 2 constructed construct VBD 10_1101-2021_01_02_425006 86 3 a a DT 10_1101-2021_01_02_425006 86 4 genetic genetic JJ 10_1101-2021_01_02_425006 86 5 or or CC 10_1101-2021_01_02_425006 86 6 intergenic intergenic JJ 10_1101-2021_01_02_425006 86 7 region region NN 10_1101-2021_01_02_425006 86 8 bias bias NN 10_1101-2021_01_02_425006 86 9 rate rate NN 10_1101-2021_01_02_425006 86 10 vector vector NN 10_1101-2021_01_02_425006 86 11 134 134 CD 10_1101-2021_01_02_425006 86 12 for for IN 10_1101-2021_01_02_425006 86 13 each each DT 10_1101-2021_01_02_425006 86 14 mRNA mrna NN 10_1101-2021_01_02_425006 86 15 transcript transcript NN 10_1101-2021_01_02_425006 86 16 by by IN 10_1101-2021_01_02_425006 86 17 calculating calculate VBG 10_1101-2021_01_02_425006 86 18 the the DT 10_1101-2021_01_02_425006 86 19 bias bias NN 10_1101-2021_01_02_425006 86 20 rate rate NN 10_1101-2021_01_02_425006 86 21 of of IN 10_1101-2021_01_02_425006 86 22 all all DT 10_1101-2021_01_02_425006 86 23 of of IN 10_1101-2021_01_02_425006 86 24 its -PRON- PRP$ 10_1101-2021_01_02_425006 86 25 component component NN 10_1101-2021_01_02_425006 86 26 genetic genetic JJ 10_1101-2021_01_02_425006 86 27 or or CC 10_1101-2021_01_02_425006 86 28 intergenic intergenic JJ 10_1101-2021_01_02_425006 86 29 135 135 CD 10_1101-2021_01_02_425006 86 30 regions region NNS 10_1101-2021_01_02_425006 86 31 . . . 10_1101-2021_01_02_425006 87 1 The the DT 10_1101-2021_01_02_425006 87 2 bias bias NN 10_1101-2021_01_02_425006 87 3 rate rate NN 10_1101-2021_01_02_425006 87 4 of of IN 10_1101-2021_01_02_425006 87 5 a a DT 10_1101-2021_01_02_425006 87 6 genetic genetic JJ 10_1101-2021_01_02_425006 87 7 or or CC 10_1101-2021_01_02_425006 87 8 an an DT 10_1101-2021_01_02_425006 87 9 intergenic intergenic JJ 10_1101-2021_01_02_425006 87 10 region region NN 10_1101-2021_01_02_425006 87 11 is be VBZ 10_1101-2021_01_02_425006 87 12 the the DT 10_1101-2021_01_02_425006 87 13 average average JJ 10_1101-2021_01_02_425006 87 14 bias bias NN 10_1101-2021_01_02_425006 87 15 rate rate NN 10_1101-2021_01_02_425006 87 16 of of IN 10_1101-2021_01_02_425006 87 17 all all PDT 10_1101-2021_01_02_425006 87 18 the the DT 10_1101-2021_01_02_425006 87 19 genomic genomic JJ 10_1101-2021_01_02_425006 87 20 136 136 CD 10_1101-2021_01_02_425006 87 21 positions position NNS 10_1101-2021_01_02_425006 87 22 that that IN 10_1101-2021_01_02_425006 87 23 it -PRON- PRP 10_1101-2021_01_02_425006 87 24 contains contain VBZ 10_1101-2021_01_02_425006 87 25 . . . 10_1101-2021_01_02_425006 88 1 Considering consider VBG 10_1101-2021_01_02_425006 88 2 an an DT 10_1101-2021_01_02_425006 88 3 mRNA mrna NN 10_1101-2021_01_02_425006 88 4 transcript transcript NN 10_1101-2021_01_02_425006 88 5 � � NNP 10_1101-2021_01_02_425006 88 6 and and CC 10_1101-2021_01_02_425006 88 7 its -PRON- PRP$ 10_1101-2021_01_02_425006 88 8 component component NN 10_1101-2021_01_02_425006 88 9 gene gene NN 10_1101-2021_01_02_425006 88 10 set set VBD 10_1101-2021_01_02_425006 88 11 137 137 CD 10_1101-2021_01_02_425006 88 12 { { -LRB- 10_1101-2021_01_02_425006 88 13 � � NNP 10_1101-2021_01_02_425006 88 14 � � NNP 10_1101-2021_01_02_425006 88 15 , , , 10_1101-2021_01_02_425006 88 16 � � NNP 10_1101-2021_01_02_425006 88 17 � � NNP 10_1101-2021_01_02_425006 88 18 , , , 10_1101-2021_01_02_425006 88 19 … … NFP 10_1101-2021_01_02_425006 88 20 , , , 10_1101-2021_01_02_425006 88 21 � � NNP 10_1101-2021_01_02_425006 88 22 � � . 10_1101-2021_01_02_425006 88 23 } } -RRB- 10_1101-2021_01_02_425006 88 24 ( ( -LRB- 10_1101-2021_01_02_425006 88 25 the the DT 10_1101-2021_01_02_425006 88 26 details detail NNS 10_1101-2021_01_02_425006 88 27 of of IN 10_1101-2021_01_02_425006 88 28 the the DT 10_1101-2021_01_02_425006 88 29 gene gene NN 10_1101-2021_01_02_425006 88 30 labels label NNS 10_1101-2021_01_02_425006 88 31 are be VBP 10_1101-2021_01_02_425006 88 32 described describe VBN 10_1101-2021_01_02_425006 88 33 in in IN 10_1101-2021_01_02_425006 88 34 method method NNP 10_1101-2021_01_02_425006 88 35 S2 S2 NNP 10_1101-2021_01_02_425006 88 36 ) ) -RRB- 10_1101-2021_01_02_425006 88 37 , , , 10_1101-2021_01_02_425006 88 38 we -PRON- PRP 10_1101-2021_01_02_425006 88 39 denoted denote VBD 10_1101-2021_01_02_425006 88 40 the the DT 10_1101-2021_01_02_425006 88 41 genetic genetic JJ 10_1101-2021_01_02_425006 88 42 138 138 CD 10_1101-2021_01_02_425006 88 43 region region NN 10_1101-2021_01_02_425006 88 44 bias bias NN 10_1101-2021_01_02_425006 88 45 rate rate NN 10_1101-2021_01_02_425006 88 46 vector vector NN 10_1101-2021_01_02_425006 88 47 as as IN 10_1101-2021_01_02_425006 88 48 � � NNP 10_1101-2021_01_02_425006 88 49 = = NFP 10_1101-2021_01_02_425006 88 50 ( ( -LRB- 10_1101-2021_01_02_425006 88 51 � � NNP 10_1101-2021_01_02_425006 88 52 � � NNP 10_1101-2021_01_02_425006 88 53 , , , 10_1101-2021_01_02_425006 88 54 � � NNP 10_1101-2021_01_02_425006 88 55 � � NNP 10_1101-2021_01_02_425006 88 56 , , , 10_1101-2021_01_02_425006 88 57 … … NFP 10_1101-2021_01_02_425006 88 58 , , , 10_1101-2021_01_02_425006 88 59 � � NNP 10_1101-2021_01_02_425006 88 60 � � NNP 10_1101-2021_01_02_425006 88 61 ) ) -RRB- 10_1101-2021_01_02_425006 88 62 , , , 10_1101-2021_01_02_425006 88 63 which which WDT 10_1101-2021_01_02_425006 88 64 was be VBD 10_1101-2021_01_02_425006 88 65 calculated calculate VBN 10_1101-2021_01_02_425006 88 66 using use VBG 10_1101-2021_01_02_425006 88 67 the the DT 10_1101-2021_01_02_425006 88 68 formula formula NN 10_1101-2021_01_02_425006 88 69 : : : 10_1101-2021_01_02_425006 88 70 139 139 CD 10_1101-2021_01_02_425006 88 71 � � NNP 10_1101-2021_01_02_425006 88 72 � � NNP 10_1101-2021_01_02_425006 88 73 = = SYM 10_1101-2021_01_02_425006 88 74 ⎩ ⎩ NNP 10_1101-2021_01_02_425006 88 75 ⎪ ⎪ NNP 10_1101-2021_01_02_425006 88 76 ⎨ ⎨ NNP 10_1101-2021_01_02_425006 88 77 ⎪ ⎪ NNP 10_1101-2021_01_02_425006 88 78 ⎧ ⎧ NNP 10_1101-2021_01_02_425006 88 79 ∑ ∑ . 10_1101-2021_01_02_425006 88 80 � � NNP 10_1101-2021_01_02_425006 88 81 ( ( -LRB- 10_1101-2021_01_02_425006 88 82 � � NNP 10_1101-2021_01_02_425006 88 83 � � NNP 10_1101-2021_01_02_425006 88 84 ) ) -RRB- 10_1101-2021_01_02_425006 88 85 � � NNP 10_1101-2021_01_02_425006 88 86 � � ADD 10_1101-2021_01_02_425006 88 87 � � VBZ 10_1101-2021_01_02_425006 88 88 � � VBZ 10_1101-2021_01_02_425006 88 89 � � VBZ 10_1101-2021_01_02_425006 88 90 � � VBZ 10_1101-2021_01_02_425006 88 91 � � VBZ 10_1101-2021_01_02_425006 88 92 � � VBZ 10_1101-2021_01_02_425006 88 93 � � VBZ 10_1101-2021_01_02_425006 88 94 � � VBZ 10_1101-2021_01_02_425006 88 95 � � VBZ 10_1101-2021_01_02_425006 88 96 � � VBZ 10_1101-2021_01_02_425006 88 97 � � VBZ 10_1101-2021_01_02_425006 88 98 � � VBZ 10_1101-2021_01_02_425006 88 99 � � VBZ 10_1101-2021_01_02_425006 88 100 � � VBZ 10_1101-2021_01_02_425006 88 101 � � VBZ 10_1101-2021_01_02_425006 88 102 � � VBZ 10_1101-2021_01_02_425006 88 103 � � VBZ 10_1101-2021_01_02_425006 88 104 � � NNS 10_1101-2021_01_02_425006 88 105 � � NNS 10_1101-2021_01_02_425006 88 106 − − NNP 10_1101-2021_01_02_425006 88 107 � � VBZ 10_1101-2021_01_02_425006 88 108 � � NN 10_1101-2021_01_02_425006 88 109 � � NNS 10_1101-2021_01_02_425006 88 110 � � VBZ 10_1101-2021_01_02_425006 88 111 � � JJ 10_1101-2021_01_02_425006 88 112 � � NNS 10_1101-2021_01_02_425006 88 113 � � NN 10_1101-2021_01_02_425006 88 114 + + CC 10_1101-2021_01_02_425006 88 115 1 1 CD 10_1101-2021_01_02_425006 88 116 , , , 10_1101-2021_01_02_425006 88 117 � � NNP 10_1101-2021_01_02_425006 88 118 � � ADD 10_1101-2021_01_02_425006 88 119 � � VBZ 10_1101-2021_01_02_425006 88 120 � � VBZ 10_1101-2021_01_02_425006 88 121 � � NN 10_1101-2021_01_02_425006 88 122 � � NNS 10_1101-2021_01_02_425006 88 123 � � NNP 10_1101-2021_01_02_425006 88 124 ∑ ∑ . 10_1101-2021_01_02_425006 88 125 � � NNP 10_1101-2021_01_02_425006 88 126 ( ( -LRB- 10_1101-2021_01_02_425006 88 127 � � NNP 10_1101-2021_01_02_425006 88 128 � � NNP 10_1101-2021_01_02_425006 88 129 ) ) -RRB- 10_1101-2021_01_02_425006 88 130 � � NNP 10_1101-2021_01_02_425006 88 131 � � ADD 10_1101-2021_01_02_425006 88 132 � � VBZ 10_1101-2021_01_02_425006 88 133 � � VBZ 10_1101-2021_01_02_425006 88 134 � � VBZ 10_1101-2021_01_02_425006 88 135 � � VBZ 10_1101-2021_01_02_425006 88 136 � � VBZ 10_1101-2021_01_02_425006 88 137 � � NNS 10_1101-2021_01_02_425006 88 138 � � NNS 10_1101-2021_01_02_425006 88 139 − − NNP 10_1101-2021_01_02_425006 88 140 � � NNP 10_1101-2021_01_02_425006 88 141 � � NNP 10_1101-2021_01_02_425006 88 142 � � NN 10_1101-2021_01_02_425006 88 143 + + CC 10_1101-2021_01_02_425006 88 144 1 1 CD 10_1101-2021_01_02_425006 88 145 , , , 10_1101-2021_01_02_425006 88 146 � � NNP 10_1101-2021_01_02_425006 88 147 � � ADD 10_1101-2021_01_02_425006 88 148 � � VBZ 10_1101-2021_01_02_425006 88 149 � � VBZ 10_1101-2021_01_02_425006 88 150 � � JJ 10_1101-2021_01_02_425006 88 151 � � NNS 10_1101-2021_01_02_425006 88 152 � � NNP 10_1101-2021_01_02_425006 88 153 ( ( -LRB- 10_1101-2021_01_02_425006 88 154 5 5 CD 10_1101-2021_01_02_425006 88 155 ) ) -RRB- 10_1101-2021_01_02_425006 88 156 .CC .CC : 10_1101-2021_01_02_425006 88 157 - - : 10_1101-2021_01_02_425006 88 158 BY by IN 10_1101-2021_01_02_425006 88 159 - - HYPH 10_1101-2021_01_02_425006 88 160 NC NC NNP 10_1101-2021_01_02_425006 88 161 - - HYPH 10_1101-2021_01_02_425006 88 162 ND ND NNP 10_1101-2021_01_02_425006 88 163 4.0 4.0 CD 10_1101-2021_01_02_425006 88 164 International International NNP 10_1101-2021_01_02_425006 88 165 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 88 166 under under IN 10_1101-2021_01_02_425006 88 167 a a DT 10_1101-2021_01_02_425006 88 168 ( ( -LRB- 10_1101-2021_01_02_425006 88 169 which which WDT 10_1101-2021_01_02_425006 88 170 was be VBD 10_1101-2021_01_02_425006 88 171 not not RB 10_1101-2021_01_02_425006 88 172 certified certify VBN 10_1101-2021_01_02_425006 88 173 by by IN 10_1101-2021_01_02_425006 88 174 peer peer NN 10_1101-2021_01_02_425006 88 175 review review NN 10_1101-2021_01_02_425006 88 176 ) ) -RRB- 10_1101-2021_01_02_425006 88 177 is be VBZ 10_1101-2021_01_02_425006 88 178 the the DT 10_1101-2021_01_02_425006 88 179 author author NN 10_1101-2021_01_02_425006 88 180 / / SYM 10_1101-2021_01_02_425006 88 181 funder funder NN 10_1101-2021_01_02_425006 88 182 , , , 10_1101-2021_01_02_425006 88 183 who who WP 10_1101-2021_01_02_425006 88 184 has have VBZ 10_1101-2021_01_02_425006 88 185 granted grant VBN 10_1101-2021_01_02_425006 88 186 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 88 187 a a DT 10_1101-2021_01_02_425006 88 188 license license NN 10_1101-2021_01_02_425006 88 189 to to TO 10_1101-2021_01_02_425006 88 190 display display VB 10_1101-2021_01_02_425006 88 191 the the DT 10_1101-2021_01_02_425006 88 192 preprint preprint NN 10_1101-2021_01_02_425006 88 193 in in IN 10_1101-2021_01_02_425006 88 194 perpetuity perpetuity NN 10_1101-2021_01_02_425006 88 195 . . . 10_1101-2021_01_02_425006 89 1 It -PRON- PRP 10_1101-2021_01_02_425006 89 2 is be VBZ 10_1101-2021_01_02_425006 89 3 made make VBN 10_1101-2021_01_02_425006 89 4 The the DT 10_1101-2021_01_02_425006 89 5 copyright copyright NN 10_1101-2021_01_02_425006 89 6 holder holder NN 10_1101-2021_01_02_425006 89 7 for for IN 10_1101-2021_01_02_425006 89 8 this this DT 10_1101-2021_01_02_425006 89 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 89 10 version version NN 10_1101-2021_01_02_425006 89 11 posted post VBD 10_1101-2021_01_02_425006 89 12 January January NNP 10_1101-2021_01_02_425006 89 13 6 6 CD 10_1101-2021_01_02_425006 89 14 , , , 10_1101-2021_01_02_425006 89 15 2021 2021 CD 10_1101-2021_01_02_425006 89 16 . . . 10_1101-2021_01_02_425006 89 17 ; ; : 10_1101-2021_01_02_425006 89 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 89 19 : : : 10_1101-2021_01_02_425006 89 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 89 21 preprint preprint NN 10_1101-2021_01_02_425006 89 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 89 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 89 24 9 9 CD 10_1101-2021_01_02_425006 89 25 where where WRB 10_1101-2021_01_02_425006 89 26 � � NNP 10_1101-2021_01_02_425006 89 27 denotes denote VBZ 10_1101-2021_01_02_425006 89 28 the the DT 10_1101-2021_01_02_425006 89 29 number number NN 10_1101-2021_01_02_425006 89 30 of of IN 10_1101-2021_01_02_425006 89 31 genomic genomic JJ 10_1101-2021_01_02_425006 89 32 positions position NNS 10_1101-2021_01_02_425006 89 33 on on IN 10_1101-2021_01_02_425006 89 34 � � NNP 10_1101-2021_01_02_425006 89 35 ; ; , 10_1101-2021_01_02_425006 89 36 � � NNP 10_1101-2021_01_02_425006 89 37 � � NNP 10_1101-2021_01_02_425006 89 38 denotes denote VBZ 10_1101-2021_01_02_425006 89 39 the the DT 10_1101-2021_01_02_425006 89 40 bias bias NN 10_1101-2021_01_02_425006 89 41 rate rate NN 10_1101-2021_01_02_425006 89 42 of of IN 10_1101-2021_01_02_425006 89 43 � � NNP 10_1101-2021_01_02_425006 89 44 � � NNP 10_1101-2021_01_02_425006 89 45 for for IN 10_1101-2021_01_02_425006 89 46 � � NNP 10_1101-2021_01_02_425006 89 47 ; ; , 10_1101-2021_01_02_425006 89 48 and and CC 10_1101-2021_01_02_425006 89 49 140 140 CD 10_1101-2021_01_02_425006 89 50 � � NNS 10_1101-2021_01_02_425006 89 51 � � NNS 10_1101-2021_01_02_425006 89 52 = = NFP 10_1101-2021_01_02_425006 89 53 ( ( -LRB- 10_1101-2021_01_02_425006 89 54 � � NNP 10_1101-2021_01_02_425006 89 55 � � NNP 10_1101-2021_01_02_425006 89 56 � � NNP 10_1101-2021_01_02_425006 89 57 , , , 10_1101-2021_01_02_425006 89 58 � � NNP 10_1101-2021_01_02_425006 89 59 � � NNP 10_1101-2021_01_02_425006 89 60 � � NNP 10_1101-2021_01_02_425006 89 61 , , , 10_1101-2021_01_02_425006 89 62 � � NNP 10_1101-2021_01_02_425006 89 63 � � NNP 10_1101-2021_01_02_425006 89 64 � � NNP 10_1101-2021_01_02_425006 89 65 , , , 10_1101-2021_01_02_425006 89 66 � � NNP 10_1101-2021_01_02_425006 89 67 � � NNP 10_1101-2021_01_02_425006 89 68 � � NNP 10_1101-2021_01_02_425006 89 69 , , , 10_1101-2021_01_02_425006 89 70 … … NFP 10_1101-2021_01_02_425006 89 71 , , , 10_1101-2021_01_02_425006 89 72 � � NNP 10_1101-2021_01_02_425006 89 73 � � NNP 10_1101-2021_01_02_425006 89 74 � � NNP 10_1101-2021_01_02_425006 89 75 , , , 10_1101-2021_01_02_425006 89 76 � � NNP 10_1101-2021_01_02_425006 89 77 � � NNP 10_1101-2021_01_02_425006 89 78 � � NNP 10_1101-2021_01_02_425006 89 79 ) ) -RRB- 10_1101-2021_01_02_425006 89 80 is be VBZ 10_1101-2021_01_02_425006 89 81 the the DT 10_1101-2021_01_02_425006 89 82 range range NN 10_1101-2021_01_02_425006 89 83 of of IN 10_1101-2021_01_02_425006 89 84 the the DT 10_1101-2021_01_02_425006 89 85 genomic genomic JJ 10_1101-2021_01_02_425006 89 86 positions position NNS 10_1101-2021_01_02_425006 89 87 of of IN 10_1101-2021_01_02_425006 89 88 { { -LRB- 10_1101-2021_01_02_425006 89 89 � � NNP 10_1101-2021_01_02_425006 89 90 � � NNP 10_1101-2021_01_02_425006 89 91 , , , 10_1101-2021_01_02_425006 89 92 � � NNP 10_1101-2021_01_02_425006 89 93 � � NNP 10_1101-2021_01_02_425006 89 94 , , , 10_1101-2021_01_02_425006 89 95 … … NFP 10_1101-2021_01_02_425006 89 96 , , , 10_1101-2021_01_02_425006 89 97 � � NNP 10_1101-2021_01_02_425006 89 98 � � NNP 10_1101-2021_01_02_425006 89 99 } } -RRB- 10_1101-2021_01_02_425006 89 100 , , , 10_1101-2021_01_02_425006 89 101 while while IN 10_1101-2021_01_02_425006 89 102 the the DT 10_1101-2021_01_02_425006 89 103 141 141 CD 10_1101-2021_01_02_425006 89 104 range range NN 10_1101-2021_01_02_425006 89 105 of of IN 10_1101-2021_01_02_425006 89 106 the the DT 10_1101-2021_01_02_425006 89 107 genomic genomic JJ 10_1101-2021_01_02_425006 89 108 positions position NNS 10_1101-2021_01_02_425006 89 109 of of IN 10_1101-2021_01_02_425006 89 110 � � NNP 10_1101-2021_01_02_425006 89 111 � � NNP 10_1101-2021_01_02_425006 89 112 is be VBZ 10_1101-2021_01_02_425006 89 113 [ [ -LRB- 10_1101-2021_01_02_425006 89 114 � � NNP 10_1101-2021_01_02_425006 89 115 � � NNP 10_1101-2021_01_02_425006 89 116 � � NNP 10_1101-2021_01_02_425006 89 117 , , , 10_1101-2021_01_02_425006 89 118 � � NNP 10_1101-2021_01_02_425006 89 119 � � NNP 10_1101-2021_01_02_425006 89 120 � � NNP 10_1101-2021_01_02_425006 89 121 ] ] -RRB- 10_1101-2021_01_02_425006 89 122 , , , 10_1101-2021_01_02_425006 89 123 1 1 CD 10_1101-2021_01_02_425006 89 124 ≤ ≤ NN 10_1101-2021_01_02_425006 89 125 � � NNP 10_1101-2021_01_02_425006 89 126 ≤ ≤ NN 10_1101-2021_01_02_425006 89 127 � � NNP 10_1101-2021_01_02_425006 89 128 . . . 10_1101-2021_01_02_425006 90 1 Similarly similarly RB 10_1101-2021_01_02_425006 90 2 , , , 10_1101-2021_01_02_425006 90 3 the the DT 10_1101-2021_01_02_425006 90 4 calculation calculation NN 10_1101-2021_01_02_425006 90 5 of of IN 10_1101-2021_01_02_425006 90 6 the the DT 10_1101-2021_01_02_425006 90 7 intergenic intergenic JJ 10_1101-2021_01_02_425006 90 8 142 142 CD 10_1101-2021_01_02_425006 90 9 region region NN 10_1101-2021_01_02_425006 90 10 bias bias NN 10_1101-2021_01_02_425006 90 11 rate rate NN 10_1101-2021_01_02_425006 90 12 vector vector NN 10_1101-2021_01_02_425006 90 13 � � , 10_1101-2021_01_02_425006 90 14 = = NFP 10_1101-2021_01_02_425006 90 15 ( ( -LRB- 10_1101-2021_01_02_425006 90 16 � � NNP 10_1101-2021_01_02_425006 90 17 � � NNP 10_1101-2021_01_02_425006 90 18 , , , 10_1101-2021_01_02_425006 90 19 � � NNP 10_1101-2021_01_02_425006 90 20 � � NNP 10_1101-2021_01_02_425006 90 21 , , , 10_1101-2021_01_02_425006 90 22 … … NFP 10_1101-2021_01_02_425006 90 23 , , , 10_1101-2021_01_02_425006 90 24 � � NNP 10_1101-2021_01_02_425006 90 25 � � ADD 10_1101-2021_01_02_425006 90 26 � � NNP 10_1101-2021_01_02_425006 90 27 � � NNP 10_1101-2021_01_02_425006 90 28 ) ) -RRB- 10_1101-2021_01_02_425006 90 29 is be VBZ 10_1101-2021_01_02_425006 90 30 provided provide VBN 10_1101-2021_01_02_425006 90 31 in in IN 10_1101-2021_01_02_425006 90 32 method method NNP 10_1101-2021_01_02_425006 90 33 S3 S3 NNP 10_1101-2021_01_02_425006 90 34 . . . 10_1101-2021_01_02_425006 91 1 143 143 CD 10_1101-2021_01_02_425006 91 2 Modification Modification NNP 10_1101-2021_01_02_425006 91 3 of of IN 10_1101-2021_01_02_425006 91 4 maximal maximal JJ 10_1101-2021_01_02_425006 91 5 ATU ATU NNP 10_1101-2021_01_02_425006 91 6 clusters cluster NNS 10_1101-2021_01_02_425006 91 7 144 144 CD 10_1101-2021_01_02_425006 91 8 A a DT 10_1101-2021_01_02_425006 91 9 maximal maximal JJ 10_1101-2021_01_02_425006 91 10 ATU ATU NNP 10_1101-2021_01_02_425006 91 11 cluster cluster NN 10_1101-2021_01_02_425006 91 12 is be VBZ 10_1101-2021_01_02_425006 91 13 defined define VBN 10_1101-2021_01_02_425006 91 14 as as IN 10_1101-2021_01_02_425006 91 15 a a DT 10_1101-2021_01_02_425006 91 16 maximal maximal JJ 10_1101-2021_01_02_425006 91 17 consecutive consecutive JJ 10_1101-2021_01_02_425006 91 18 gene gene NN 10_1101-2021_01_02_425006 91 19 set set VBD 10_1101-2021_01_02_425006 91 20 such such JJ 10_1101-2021_01_02_425006 91 21 that that IN 10_1101-2021_01_02_425006 91 22 each each DT 10_1101-2021_01_02_425006 91 23 pair pair NN 10_1101-2021_01_02_425006 91 24 of of IN 10_1101-2021_01_02_425006 91 25 its -PRON- PRP$ 10_1101-2021_01_02_425006 91 26 145 145 CD 10_1101-2021_01_02_425006 91 27 consecutive consecutive JJ 10_1101-2021_01_02_425006 91 28 genes gene NNS 10_1101-2021_01_02_425006 91 29 can can MD 10_1101-2021_01_02_425006 91 30 be be VB 10_1101-2021_01_02_425006 91 31 covered cover VBN 10_1101-2021_01_02_425006 91 32 by by IN 10_1101-2021_01_02_425006 91 33 at at RB 10_1101-2021_01_02_425006 91 34 least least RBS 10_1101-2021_01_02_425006 91 35 one one CD 10_1101-2021_01_02_425006 91 36 ATU ATU NNP 10_1101-2021_01_02_425006 91 37 . . . 10_1101-2021_01_02_425006 92 1 Similar similar JJ 10_1101-2021_01_02_425006 92 2 to to IN 10_1101-2021_01_02_425006 92 3 ATUs atu NNS 10_1101-2021_01_02_425006 92 4 , , , 10_1101-2021_01_02_425006 92 5 maximal maximal JJ 10_1101-2021_01_02_425006 92 6 ATU ATU NNP 10_1101-2021_01_02_425006 92 7 clusters cluster NNS 10_1101-2021_01_02_425006 92 8 are be VBP 10_1101-2021_01_02_425006 92 9 also also RB 10_1101-2021_01_02_425006 92 10 146 146 CD 10_1101-2021_01_02_425006 92 11 dynamically dynamically RB 10_1101-2021_01_02_425006 92 12 composed compose VBN 10_1101-2021_01_02_425006 92 13 under under IN 10_1101-2021_01_02_425006 92 14 different different JJ 10_1101-2021_01_02_425006 92 15 conditions condition NNS 10_1101-2021_01_02_425006 92 16 or or CC 10_1101-2021_01_02_425006 92 17 environmental environmental JJ 10_1101-2021_01_02_425006 92 18 stimuli stimulus NNS 10_1101-2021_01_02_425006 92 19 in in IN 10_1101-2021_01_02_425006 92 20 bacterial bacterial JJ 10_1101-2021_01_02_425006 92 21 genomes genome NNS 10_1101-2021_01_02_425006 92 22 ( ( -LRB- 10_1101-2021_01_02_425006 92 23 5 5 CD 10_1101-2021_01_02_425006 92 24 , , , 10_1101-2021_01_02_425006 92 25 38 38 CD 10_1101-2021_01_02_425006 92 26 ) ) -RRB- 10_1101-2021_01_02_425006 92 27 . . . 10_1101-2021_01_02_425006 93 1 147 147 CD 10_1101-2021_01_02_425006 93 2 Such such PDT 10_1101-2021_01_02_425006 93 3 a a DT 10_1101-2021_01_02_425006 93 4 maximal maximal JJ 10_1101-2021_01_02_425006 93 5 ATU atu NN 10_1101-2021_01_02_425006 93 6 cluster cluster NN 10_1101-2021_01_02_425006 93 7 can can MD 10_1101-2021_01_02_425006 93 8 be be VB 10_1101-2021_01_02_425006 93 9 used use VBN 10_1101-2021_01_02_425006 93 10 as as IN 10_1101-2021_01_02_425006 93 11 an an DT 10_1101-2021_01_02_425006 93 12 independent independent JJ 10_1101-2021_01_02_425006 93 13 genomic genomic JJ 10_1101-2021_01_02_425006 93 14 region region NN 10_1101-2021_01_02_425006 93 15 for for IN 10_1101-2021_01_02_425006 93 16 ATU ATU NNP 10_1101-2021_01_02_425006 93 17 prediction prediction NN 10_1101-2021_01_02_425006 93 18 , , , 10_1101-2021_01_02_425006 93 19 which which WDT 10_1101-2021_01_02_425006 93 20 148 148 CD 10_1101-2021_01_02_425006 93 21 alleviates alleviate VBZ 10_1101-2021_01_02_425006 93 22 the the DT 10_1101-2021_01_02_425006 93 23 difficulty difficulty NN 10_1101-2021_01_02_425006 93 24 in in IN 10_1101-2021_01_02_425006 93 25 computationally computationally RB 10_1101-2021_01_02_425006 93 26 predicting predict VBG 10_1101-2021_01_02_425006 93 27 ATUs atu NNS 10_1101-2021_01_02_425006 93 28 at at IN 10_1101-2021_01_02_425006 93 29 the the DT 10_1101-2021_01_02_425006 93 30 genome genome JJ 10_1101-2021_01_02_425006 93 31 scale scale NN 10_1101-2021_01_02_425006 93 32 . . . 10_1101-2021_01_02_425006 94 1 The the DT 10_1101-2021_01_02_425006 94 2 output output NN 10_1101-2021_01_02_425006 94 3 of of IN 10_1101-2021_01_02_425006 94 4 our -PRON- PRP$ 10_1101-2021_01_02_425006 94 5 in-149 in-149 NNP 10_1101-2021_01_02_425006 94 6 house house NN 10_1101-2021_01_02_425006 94 7 tool tool NN 10_1101-2021_01_02_425006 94 8 rSeqTU rseqtu PRP 10_1101-2021_01_02_425006 94 9 can can MD 10_1101-2021_01_02_425006 94 10 serve serve VB 10_1101-2021_01_02_425006 94 11 as as IN 10_1101-2021_01_02_425006 94 12 the the DT 10_1101-2021_01_02_425006 94 13 maximal maximal JJ 10_1101-2021_01_02_425006 94 14 ATU ATU NNP 10_1101-2021_01_02_425006 94 15 cluster cluster NN 10_1101-2021_01_02_425006 94 16 data datum NNS 10_1101-2021_01_02_425006 94 17 , , , 10_1101-2021_01_02_425006 94 18 which which WDT 10_1101-2021_01_02_425006 94 19 lays lay VBZ 10_1101-2021_01_02_425006 94 20 a a DT 10_1101-2021_01_02_425006 94 21 solid solid JJ 10_1101-2021_01_02_425006 94 22 foundation foundation NN 10_1101-2021_01_02_425006 94 23 for for IN 10_1101-2021_01_02_425006 94 24 ATU ATU NNP 10_1101-2021_01_02_425006 94 25 150 150 CD 10_1101-2021_01_02_425006 94 26 prediction prediction NN 10_1101-2021_01_02_425006 94 27 ( ( -LRB- 10_1101-2021_01_02_425006 94 28 5 5 CD 10_1101-2021_01_02_425006 94 29 ) ) -RRB- 10_1101-2021_01_02_425006 94 30 . . . 10_1101-2021_01_02_425006 95 1 We -PRON- PRP 10_1101-2021_01_02_425006 95 2 modified modify VBD 10_1101-2021_01_02_425006 95 3 the the DT 10_1101-2021_01_02_425006 95 4 maximal maximal JJ 10_1101-2021_01_02_425006 95 5 ATU ATU NNP 10_1101-2021_01_02_425006 95 6 clusters cluster NNS 10_1101-2021_01_02_425006 95 7 from from IN 10_1101-2021_01_02_425006 95 8 rSeqTU rseqtu CD 10_1101-2021_01_02_425006 95 9 : : : 10_1101-2021_01_02_425006 95 10 ( ( -LRB- 10_1101-2021_01_02_425006 95 11 i i NN 10_1101-2021_01_02_425006 95 12 ) ) -RRB- 10_1101-2021_01_02_425006 95 13 two two CD 10_1101-2021_01_02_425006 95 14 maximal maximal JJ 10_1101-2021_01_02_425006 95 15 ATU ATU NNP 10_1101-2021_01_02_425006 95 16 clusters cluster NNS 10_1101-2021_01_02_425006 95 17 with with IN 10_1101-2021_01_02_425006 95 18 151 151 CD 10_1101-2021_01_02_425006 95 19 distances distance NNS 10_1101-2021_01_02_425006 95 20 less less JJR 10_1101-2021_01_02_425006 95 21 than than IN 10_1101-2021_01_02_425006 95 22 40 40 CD 10_1101-2021_01_02_425006 95 23 bp bp NN 10_1101-2021_01_02_425006 95 24 were be VBD 10_1101-2021_01_02_425006 95 25 combined combine VBN 10_1101-2021_01_02_425006 95 26 into into IN 10_1101-2021_01_02_425006 95 27 one one CD 10_1101-2021_01_02_425006 95 28 cluster cluster NN 10_1101-2021_01_02_425006 95 29 and and CC 10_1101-2021_01_02_425006 95 30 ( ( -LRB- 10_1101-2021_01_02_425006 95 31 ii ii NN 10_1101-2021_01_02_425006 95 32 ) ) -RRB- 10_1101-2021_01_02_425006 95 33 a a DT 10_1101-2021_01_02_425006 95 34 maximal maximal JJ 10_1101-2021_01_02_425006 95 35 ATU ATU NNP 10_1101-2021_01_02_425006 95 36 cluster cluster NN 10_1101-2021_01_02_425006 95 37 was be VBD 10_1101-2021_01_02_425006 95 38 split split VBN 10_1101-2021_01_02_425006 95 39 at at IN 10_1101-2021_01_02_425006 95 40 the the DT 10_1101-2021_01_02_425006 95 41 152 152 CD 10_1101-2021_01_02_425006 95 42 intergenic intergenic JJ 10_1101-2021_01_02_425006 95 43 region region NN 10_1101-2021_01_02_425006 95 44 where where WRB 10_1101-2021_01_02_425006 95 45 the the DT 10_1101-2021_01_02_425006 95 46 opposite opposite JJ 10_1101-2021_01_02_425006 95 47 - - HYPH 10_1101-2021_01_02_425006 95 48 strand strand NN 10_1101-2021_01_02_425006 95 49 genes gene NNS 10_1101-2021_01_02_425006 95 50 were be VBD 10_1101-2021_01_02_425006 95 51 located locate VBN 10_1101-2021_01_02_425006 95 52 . . . 10_1101-2021_01_02_425006 96 1 In in IN 10_1101-2021_01_02_425006 96 2 addition addition NN 10_1101-2021_01_02_425006 96 3 , , , 10_1101-2021_01_02_425006 96 4 we -PRON- PRP 10_1101-2021_01_02_425006 96 5 selected select VBD 10_1101-2021_01_02_425006 96 6 the the DT 10_1101-2021_01_02_425006 96 7 maximal maximal JJ 10_1101-2021_01_02_425006 96 8 153 153 CD 10_1101-2021_01_02_425006 96 9 ATU ATU NNP 10_1101-2021_01_02_425006 96 10 clusters cluster NNS 10_1101-2021_01_02_425006 96 11 with with IN 10_1101-2021_01_02_425006 96 12 expression expression NN 10_1101-2021_01_02_425006 96 13 values value NNS 10_1101-2021_01_02_425006 96 14 over over IN 10_1101-2021_01_02_425006 96 15 ten ten CD 10_1101-2021_01_02_425006 96 16 ( ( -LRB- 10_1101-2021_01_02_425006 96 17 see see VB 10_1101-2021_01_02_425006 96 18 the the DT 10_1101-2021_01_02_425006 96 19 details detail NNS 10_1101-2021_01_02_425006 96 20 in in IN 10_1101-2021_01_02_425006 96 21 method method NNP 10_1101-2021_01_02_425006 96 22 S4 S4 NNP 10_1101-2021_01_02_425006 96 23 ) ) -RRB- 10_1101-2021_01_02_425006 96 24 , , , 10_1101-2021_01_02_425006 96 25 according accord VBG 10_1101-2021_01_02_425006 96 26 to to IN 10_1101-2021_01_02_425006 96 27 the the DT 10_1101-2021_01_02_425006 96 28 study study NN 10_1101-2021_01_02_425006 96 29 of of IN 10_1101-2021_01_02_425006 96 30 154 154 CD 10_1101-2021_01_02_425006 96 31 Etwiller Etwiller NNP 10_1101-2021_01_02_425006 96 32 et et NNP 10_1101-2021_01_02_425006 96 33 al al NNP 10_1101-2021_01_02_425006 96 34 . . . 10_1101-2021_01_02_425006 97 1 ( ( -LRB- 10_1101-2021_01_02_425006 97 2 13 13 CD 10_1101-2021_01_02_425006 97 3 ) ) -RRB- 10_1101-2021_01_02_425006 97 4 . . . 10_1101-2021_01_02_425006 98 1 155 155 CD 10_1101-2021_01_02_425006 98 2 The the DT 10_1101-2021_01_02_425006 98 3 mathematical mathematical JJ 10_1101-2021_01_02_425006 98 4 programming programming NN 10_1101-2021_01_02_425006 98 5 model model NN 10_1101-2021_01_02_425006 98 6 for for IN 10_1101-2021_01_02_425006 98 7 ATU ATU NNP 10_1101-2021_01_02_425006 98 8 prediction prediction NN 10_1101-2021_01_02_425006 98 9 156 156 CD 10_1101-2021_01_02_425006 98 10 The the DT 10_1101-2021_01_02_425006 98 11 predicted predicted JJ 10_1101-2021_01_02_425006 98 12 ATU ATU NNP 10_1101-2021_01_02_425006 98 13 expression expression NN 10_1101-2021_01_02_425006 98 14 profile profile NN 10_1101-2021_01_02_425006 98 15 should should MD 10_1101-2021_01_02_425006 98 16 be be VB 10_1101-2021_01_02_425006 98 17 consistent consistent JJ 10_1101-2021_01_02_425006 98 18 with with IN 10_1101-2021_01_02_425006 98 19 the the DT 10_1101-2021_01_02_425006 98 20 observed observed JJ 10_1101-2021_01_02_425006 98 21 expression expression NN 10_1101-2021_01_02_425006 98 22 profiles profile NNS 10_1101-2021_01_02_425006 98 23 of of IN 10_1101-2021_01_02_425006 98 24 the the DT 10_1101-2021_01_02_425006 98 25 157 157 CD 10_1101-2021_01_02_425006 98 26 .CC .CC : 10_1101-2021_01_02_425006 98 27 - - HYPH 10_1101-2021_01_02_425006 98 28 BY by IN 10_1101-2021_01_02_425006 98 29 - - HYPH 10_1101-2021_01_02_425006 98 30 NC NC NNP 10_1101-2021_01_02_425006 98 31 - - HYPH 10_1101-2021_01_02_425006 98 32 ND ND NNP 10_1101-2021_01_02_425006 98 33 4.0 4.0 CD 10_1101-2021_01_02_425006 98 34 International International NNP 10_1101-2021_01_02_425006 98 35 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 98 36 under under IN 10_1101-2021_01_02_425006 98 37 a a DT 10_1101-2021_01_02_425006 98 38 ( ( -LRB- 10_1101-2021_01_02_425006 98 39 which which WDT 10_1101-2021_01_02_425006 98 40 was be VBD 10_1101-2021_01_02_425006 98 41 not not RB 10_1101-2021_01_02_425006 98 42 certified certify VBN 10_1101-2021_01_02_425006 98 43 by by IN 10_1101-2021_01_02_425006 98 44 peer peer NN 10_1101-2021_01_02_425006 98 45 review review NN 10_1101-2021_01_02_425006 98 46 ) ) -RRB- 10_1101-2021_01_02_425006 98 47 is be VBZ 10_1101-2021_01_02_425006 98 48 the the DT 10_1101-2021_01_02_425006 98 49 author author NN 10_1101-2021_01_02_425006 98 50 / / SYM 10_1101-2021_01_02_425006 98 51 funder funder NN 10_1101-2021_01_02_425006 98 52 , , , 10_1101-2021_01_02_425006 98 53 who who WP 10_1101-2021_01_02_425006 98 54 has have VBZ 10_1101-2021_01_02_425006 98 55 granted grant VBN 10_1101-2021_01_02_425006 98 56 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 98 57 a a DT 10_1101-2021_01_02_425006 98 58 license license NN 10_1101-2021_01_02_425006 98 59 to to TO 10_1101-2021_01_02_425006 98 60 display display VB 10_1101-2021_01_02_425006 98 61 the the DT 10_1101-2021_01_02_425006 98 62 preprint preprint NN 10_1101-2021_01_02_425006 98 63 in in IN 10_1101-2021_01_02_425006 98 64 perpetuity perpetuity NN 10_1101-2021_01_02_425006 98 65 . . . 10_1101-2021_01_02_425006 99 1 It -PRON- PRP 10_1101-2021_01_02_425006 99 2 is be VBZ 10_1101-2021_01_02_425006 99 3 made make VBN 10_1101-2021_01_02_425006 99 4 The the DT 10_1101-2021_01_02_425006 99 5 copyright copyright NN 10_1101-2021_01_02_425006 99 6 holder holder NN 10_1101-2021_01_02_425006 99 7 for for IN 10_1101-2021_01_02_425006 99 8 this this DT 10_1101-2021_01_02_425006 99 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 99 10 version version NN 10_1101-2021_01_02_425006 99 11 posted post VBD 10_1101-2021_01_02_425006 99 12 January January NNP 10_1101-2021_01_02_425006 99 13 6 6 CD 10_1101-2021_01_02_425006 99 14 , , , 10_1101-2021_01_02_425006 99 15 2021 2021 CD 10_1101-2021_01_02_425006 99 16 . . . 10_1101-2021_01_02_425006 99 17 ; ; : 10_1101-2021_01_02_425006 99 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 99 19 : : : 10_1101-2021_01_02_425006 99 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 99 21 preprint preprint NN 10_1101-2021_01_02_425006 99 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 99 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 99 24 10 10 CD 10_1101-2021_01_02_425006 99 25 genetic genetic JJ 10_1101-2021_01_02_425006 99 26 and and CC 10_1101-2021_01_02_425006 99 27 intergenic intergenic JJ 10_1101-2021_01_02_425006 99 28 regions region NNS 10_1101-2021_01_02_425006 99 29 . . . 10_1101-2021_01_02_425006 100 1 Therefore therefore RB 10_1101-2021_01_02_425006 100 2 , , , 10_1101-2021_01_02_425006 100 3 the the DT 10_1101-2021_01_02_425006 100 4 prediction prediction NN 10_1101-2021_01_02_425006 100 5 of of IN 10_1101-2021_01_02_425006 100 6 the the DT 10_1101-2021_01_02_425006 100 7 ATU ATU NNP 10_1101-2021_01_02_425006 100 8 profiles profile NNS 10_1101-2021_01_02_425006 100 9 can can MD 10_1101-2021_01_02_425006 100 10 be be VB 10_1101-2021_01_02_425006 100 11 modeled model VBN 10_1101-2021_01_02_425006 100 12 as as IN 10_1101-2021_01_02_425006 100 13 an an DT 10_1101-2021_01_02_425006 100 14 158 158 CD 10_1101-2021_01_02_425006 100 15 optimization optimization NN 10_1101-2021_01_02_425006 100 16 problem problem NN 10_1101-2021_01_02_425006 100 17 , , , 10_1101-2021_01_02_425006 100 18 which which WDT 10_1101-2021_01_02_425006 100 19 seeks seek VBZ 10_1101-2021_01_02_425006 100 20 an an DT 10_1101-2021_01_02_425006 100 21 optimum optimum JJ 10_1101-2021_01_02_425006 100 22 expression expression NN 10_1101-2021_01_02_425006 100 23 combination combination NN 10_1101-2021_01_02_425006 100 24 of of IN 10_1101-2021_01_02_425006 100 25 all all DT 10_1101-2021_01_02_425006 100 26 of of IN 10_1101-2021_01_02_425006 100 27 the the DT 10_1101-2021_01_02_425006 100 28 to to TO 10_1101-2021_01_02_425006 100 29 - - HYPH 10_1101-2021_01_02_425006 100 30 be be VB 10_1101-2021_01_02_425006 100 31 - - HYPH 10_1101-2021_01_02_425006 100 32 identified identify VBN 10_1101-2021_01_02_425006 100 33 159 159 CD 10_1101-2021_01_02_425006 100 34 ATUs atu NNS 10_1101-2021_01_02_425006 100 35 to to TO 10_1101-2021_01_02_425006 100 36 minimize minimize VB 10_1101-2021_01_02_425006 100 37 the the DT 10_1101-2021_01_02_425006 100 38 gap gap NN 10_1101-2021_01_02_425006 100 39 between between IN 10_1101-2021_01_02_425006 100 40 the the DT 10_1101-2021_01_02_425006 100 41 predicted predict VBN 10_1101-2021_01_02_425006 100 42 ATUs atu NNS 10_1101-2021_01_02_425006 100 43 and and CC 10_1101-2021_01_02_425006 100 44 the the DT 10_1101-2021_01_02_425006 100 45 observed observe VBN 10_1101-2021_01_02_425006 100 46 genetic genetic JJ 10_1101-2021_01_02_425006 100 47 and and CC 10_1101-2021_01_02_425006 100 48 intergenic intergenic JJ 10_1101-2021_01_02_425006 100 49 region region NN 10_1101-2021_01_02_425006 100 50 160 160 CD 10_1101-2021_01_02_425006 100 51 expression expression NN 10_1101-2021_01_02_425006 100 52 profiles profile NNS 10_1101-2021_01_02_425006 100 53 . . . 10_1101-2021_01_02_425006 101 1 Here here RB 10_1101-2021_01_02_425006 101 2 , , , 10_1101-2021_01_02_425006 101 3 a a DT 10_1101-2021_01_02_425006 101 4 convex convex NNP 10_1101-2021_01_02_425006 101 5 quadratic quadratic NNP 10_1101-2021_01_02_425006 101 6 programming programming NN 10_1101-2021_01_02_425006 101 7 model model NN 10_1101-2021_01_02_425006 101 8 was be VBD 10_1101-2021_01_02_425006 101 9 built build VBN 10_1101-2021_01_02_425006 101 10 to to TO 10_1101-2021_01_02_425006 101 11 solve solve VB 10_1101-2021_01_02_425006 101 12 this this DT 10_1101-2021_01_02_425006 101 13 optimization optimization NN 10_1101-2021_01_02_425006 101 14 161 161 CD 10_1101-2021_01_02_425006 101 15 problem problem NN 10_1101-2021_01_02_425006 101 16 . . . 10_1101-2021_01_02_425006 102 1 162 162 CD 10_1101-2021_01_02_425006 102 2 We -PRON- PRP 10_1101-2021_01_02_425006 102 3 denoted denote VBD 10_1101-2021_01_02_425006 102 4 a a DT 10_1101-2021_01_02_425006 102 5 maximal maximal JJ 10_1101-2021_01_02_425006 102 6 ATU ATU NNP 10_1101-2021_01_02_425006 102 7 cluster cluster NN 10_1101-2021_01_02_425006 102 8 as as IN 10_1101-2021_01_02_425006 102 9 � � NNP 10_1101-2021_01_02_425006 102 10 , , , 10_1101-2021_01_02_425006 102 11 assuming assume VBG 10_1101-2021_01_02_425006 102 12 that that IN 10_1101-2021_01_02_425006 102 13 it -PRON- PRP 10_1101-2021_01_02_425006 102 14 contains contain VBZ 10_1101-2021_01_02_425006 102 15 the the DT 10_1101-2021_01_02_425006 102 16 consecutive consecutive JJ 10_1101-2021_01_02_425006 102 17 genes gene NNS 10_1101-2021_01_02_425006 102 18 163 163 CD 10_1101-2021_01_02_425006 102 19 { { -LRB- 10_1101-2021_01_02_425006 102 20 � � NNP 10_1101-2021_01_02_425006 102 21 � � NNP 10_1101-2021_01_02_425006 102 22 , , , 10_1101-2021_01_02_425006 102 23 … … NFP 10_1101-2021_01_02_425006 102 24 , , , 10_1101-2021_01_02_425006 102 25 � � NNP 10_1101-2021_01_02_425006 102 26 � � NNP 10_1101-2021_01_02_425006 102 27 } } -RRB- 10_1101-2021_01_02_425006 102 28 , , , 10_1101-2021_01_02_425006 102 29 and and CC 10_1101-2021_01_02_425006 102 30 the the DT 10_1101-2021_01_02_425006 102 31 intergenic intergenic JJ 10_1101-2021_01_02_425006 102 32 regions region NNS 10_1101-2021_01_02_425006 102 33 of of IN 10_1101-2021_01_02_425006 102 34 these these DT 10_1101-2021_01_02_425006 102 35 genes gene NNS 10_1101-2021_01_02_425006 102 36 are be VBP 10_1101-2021_01_02_425006 102 37 { { -LRB- 10_1101-2021_01_02_425006 102 38 � � NNP 10_1101-2021_01_02_425006 102 39 � � NNP 10_1101-2021_01_02_425006 102 40 , , , 10_1101-2021_01_02_425006 102 41 � � NNP 10_1101-2021_01_02_425006 102 42 , , , 10_1101-2021_01_02_425006 102 43 … … NFP 10_1101-2021_01_02_425006 102 44 , , , 10_1101-2021_01_02_425006 102 45 � � NNP 10_1101-2021_01_02_425006 102 46 � � NNP 10_1101-2021_01_02_425006 102 47 � � NNP 10_1101-2021_01_02_425006 102 48 � � NNP 10_1101-2021_01_02_425006 102 49 , , , 10_1101-2021_01_02_425006 102 50 � � NNP 10_1101-2021_01_02_425006 102 51 } } -RRB- 10_1101-2021_01_02_425006 102 52 . . . 10_1101-2021_01_02_425006 103 1 The the DT 10_1101-2021_01_02_425006 103 2 size size NN 10_1101-2021_01_02_425006 103 3 of of IN 10_1101-2021_01_02_425006 103 4 � � NNP 10_1101-2021_01_02_425006 103 5 is be VBZ 10_1101-2021_01_02_425006 103 6 defined define VBN 10_1101-2021_01_02_425006 103 7 as as IN 10_1101-2021_01_02_425006 103 8 164 164 CD 10_1101-2021_01_02_425006 103 9 the the DT 10_1101-2021_01_02_425006 103 10 number number NN 10_1101-2021_01_02_425006 103 11 of of IN 10_1101-2021_01_02_425006 103 12 its -PRON- PRP$ 10_1101-2021_01_02_425006 103 13 component component NN 10_1101-2021_01_02_425006 103 14 genes gene NNS 10_1101-2021_01_02_425006 103 15 � � NNP 10_1101-2021_01_02_425006 103 16 . . . 10_1101-2021_01_02_425006 104 1 Theoretically theoretically RB 10_1101-2021_01_02_425006 104 2 , , , 10_1101-2021_01_02_425006 104 3 there there EX 10_1101-2021_01_02_425006 104 4 are be VBP 10_1101-2021_01_02_425006 104 5 � � , 10_1101-2021_01_02_425006 104 6 × × NFP 10_1101-2021_01_02_425006 104 7 ( ( -LRB- 10_1101-2021_01_02_425006 104 8 � � NNP 10_1101-2021_01_02_425006 104 9 � � NNP 10_1101-2021_01_02_425006 104 10 � � NNP 10_1101-2021_01_02_425006 104 11 ) ) -RRB- 10_1101-2021_01_02_425006 104 12 � � . 10_1101-2021_01_02_425006 104 13 ATUs atu NNS 10_1101-2021_01_02_425006 104 14 for for IN 10_1101-2021_01_02_425006 104 15 � � NNP 10_1101-2021_01_02_425006 104 16 , , , 10_1101-2021_01_02_425006 104 17 and and CC 10_1101-2021_01_02_425006 104 18 an an DT 10_1101-2021_01_02_425006 104 19 ATU ATU NNP 10_1101-2021_01_02_425006 104 20 with with IN 10_1101-2021_01_02_425006 104 21 165 165 CD 10_1101-2021_01_02_425006 104 22 consecutive consecutive JJ 10_1101-2021_01_02_425006 104 23 genes gene NNS 10_1101-2021_01_02_425006 104 24 { { -LRB- 10_1101-2021_01_02_425006 104 25 � � NNP 10_1101-2021_01_02_425006 104 26 � � NNP 10_1101-2021_01_02_425006 104 27 , , , 10_1101-2021_01_02_425006 104 28 � � NNP 10_1101-2021_01_02_425006 104 29 � � NNP 10_1101-2021_01_02_425006 104 30 � � NNP 10_1101-2021_01_02_425006 104 31 � � NNP 10_1101-2021_01_02_425006 104 32 , , , 10_1101-2021_01_02_425006 104 33 … … NFP 10_1101-2021_01_02_425006 104 34 , , , 10_1101-2021_01_02_425006 104 35 � � NNP 10_1101-2021_01_02_425006 104 36 � � . 10_1101-2021_01_02_425006 104 37 } } -RRB- 10_1101-2021_01_02_425006 104 38 is be VBZ 10_1101-2021_01_02_425006 104 39 denoted denote VBN 10_1101-2021_01_02_425006 104 40 as as IN 10_1101-2021_01_02_425006 104 41 � � NNP 10_1101-2021_01_02_425006 104 42 � � NNP 10_1101-2021_01_02_425006 104 43 , , , 10_1101-2021_01_02_425006 104 44 � � NNP 10_1101-2021_01_02_425006 104 45 ; ; : 10_1101-2021_01_02_425006 104 46 the the DT 10_1101-2021_01_02_425006 104 47 corresponding corresponding JJ 10_1101-2021_01_02_425006 104 48 expression expression NN 10_1101-2021_01_02_425006 104 49 value value NN 10_1101-2021_01_02_425006 104 50 is be VBZ 10_1101-2021_01_02_425006 104 51 � � NNP 10_1101-2021_01_02_425006 104 52 � � NNP 10_1101-2021_01_02_425006 104 53 , , , 10_1101-2021_01_02_425006 104 54 � � NNP 10_1101-2021_01_02_425006 104 55 , , , 10_1101-2021_01_02_425006 104 56 1 1 CD 10_1101-2021_01_02_425006 104 57 ≤166 ≤166 NN 10_1101-2021_01_02_425006 104 58 � � NNP 10_1101-2021_01_02_425006 104 59 ≤ ≤ NNP 10_1101-2021_01_02_425006 104 60 � � NNP 10_1101-2021_01_02_425006 104 61 ≤ ≤ NN 10_1101-2021_01_02_425006 104 62 � � NNP 10_1101-2021_01_02_425006 104 63 . . . 10_1101-2021_01_02_425006 105 1 167 167 CD 10_1101-2021_01_02_425006 105 2 For for IN 10_1101-2021_01_02_425006 105 3 the the DT 10_1101-2021_01_02_425006 105 4 component component NN 10_1101-2021_01_02_425006 105 5 gene gene NN 10_1101-2021_01_02_425006 105 6 � � NNP 10_1101-2021_01_02_425006 105 7 � � NNP 10_1101-2021_01_02_425006 105 8 of of IN 10_1101-2021_01_02_425006 105 9 � � NNP 10_1101-2021_01_02_425006 105 10 , , , 10_1101-2021_01_02_425006 105 11 the the DT 10_1101-2021_01_02_425006 105 12 gap gap NN 10_1101-2021_01_02_425006 105 13 between between IN 10_1101-2021_01_02_425006 105 14 the the DT 10_1101-2021_01_02_425006 105 15 gene gene NN 10_1101-2021_01_02_425006 105 16 expression expression NN 10_1101-2021_01_02_425006 105 17 value value NN 10_1101-2021_01_02_425006 105 18 � � . 10_1101-2021_01_02_425006 105 19 � � NNP 10_1101-2021_01_02_425006 105 20 and and CC 10_1101-2021_01_02_425006 105 21 the the DT 10_1101-2021_01_02_425006 105 22 sum sum NN 10_1101-2021_01_02_425006 105 23 of of IN 10_1101-2021_01_02_425006 105 24 the the DT 10_1101-2021_01_02_425006 105 25 168 168 CD 10_1101-2021_01_02_425006 105 26 expression expression NN 10_1101-2021_01_02_425006 105 27 level level NN 10_1101-2021_01_02_425006 105 28 of of IN 10_1101-2021_01_02_425006 105 29 the the DT 10_1101-2021_01_02_425006 105 30 ATUs atu NNS 10_1101-2021_01_02_425006 105 31 containing contain VBG 10_1101-2021_01_02_425006 105 32 it -PRON- PRP 10_1101-2021_01_02_425006 105 33 is be VBZ 10_1101-2021_01_02_425006 105 34 denoted denote VBN 10_1101-2021_01_02_425006 105 35 as as IN 10_1101-2021_01_02_425006 105 36 � � NNP 10_1101-2021_01_02_425006 105 37 � � NNP 10_1101-2021_01_02_425006 105 38 , , , 10_1101-2021_01_02_425006 105 39 which which WDT 10_1101-2021_01_02_425006 105 40 provides provide VBZ 10_1101-2021_01_02_425006 105 41 the the DT 10_1101-2021_01_02_425006 105 42 first first JJ 10_1101-2021_01_02_425006 105 43 � � NNP 10_1101-2021_01_02_425006 105 44 equality equality NN 10_1101-2021_01_02_425006 105 45 169 169 CD 10_1101-2021_01_02_425006 105 46 constraints constraint NNS 10_1101-2021_01_02_425006 105 47 in in IN 10_1101-2021_01_02_425006 105 48 our -PRON- PRP$ 10_1101-2021_01_02_425006 105 49 mathematical mathematical JJ 10_1101-2021_01_02_425006 105 50 programming programming NN 10_1101-2021_01_02_425006 105 51 model model NN 10_1101-2021_01_02_425006 105 52 , , , 10_1101-2021_01_02_425006 105 53 � � NNP 10_1101-2021_01_02_425006 105 54 = = SYM 10_1101-2021_01_02_425006 105 55 1,2 1,2 CD 10_1101-2021_01_02_425006 105 56 , , , 10_1101-2021_01_02_425006 105 57 … … NFP 10_1101-2021_01_02_425006 105 58 , , , 10_1101-2021_01_02_425006 105 59 � � NNP 10_1101-2021_01_02_425006 105 60 . . . 10_1101-2021_01_02_425006 106 1 Similarly similarly RB 10_1101-2021_01_02_425006 106 2 , , , 10_1101-2021_01_02_425006 106 3 for for IN 10_1101-2021_01_02_425006 106 4 the the DT 10_1101-2021_01_02_425006 106 5 intergenic intergenic JJ 10_1101-2021_01_02_425006 106 6 region region NN 10_1101-2021_01_02_425006 106 7 170 170 CD 10_1101-2021_01_02_425006 106 8 � � NNP 10_1101-2021_01_02_425006 106 9 � � NNP 10_1101-2021_01_02_425006 106 10 , , , 10_1101-2021_01_02_425006 106 11 � � NNP 10_1101-2021_01_02_425006 106 12 � � NNS 10_1101-2021_01_02_425006 106 13 � � NN 10_1101-2021_01_02_425006 106 14 of of IN 10_1101-2021_01_02_425006 106 15 � � NNP 10_1101-2021_01_02_425006 106 16 , , , 10_1101-2021_01_02_425006 106 17 the the DT 10_1101-2021_01_02_425006 106 18 gap gap NN 10_1101-2021_01_02_425006 106 19 between between IN 10_1101-2021_01_02_425006 106 20 the the DT 10_1101-2021_01_02_425006 106 21 intergenic intergenic JJ 10_1101-2021_01_02_425006 106 22 region region NN 10_1101-2021_01_02_425006 106 23 expression expression NN 10_1101-2021_01_02_425006 106 24 value value NN 10_1101-2021_01_02_425006 106 25 � � NNP 10_1101-2021_01_02_425006 106 26 � � NNP 10_1101-2021_01_02_425006 106 27 , , , 10_1101-2021_01_02_425006 106 28 � � NNP 10_1101-2021_01_02_425006 106 29 � � NNP 10_1101-2021_01_02_425006 106 30 � � NNP 10_1101-2021_01_02_425006 106 31 and and CC 10_1101-2021_01_02_425006 106 32 the the DT 10_1101-2021_01_02_425006 106 33 sum sum NN 10_1101-2021_01_02_425006 106 34 of of IN 10_1101-2021_01_02_425006 106 35 the the DT 10_1101-2021_01_02_425006 106 36 171 171 CD 10_1101-2021_01_02_425006 106 37 expression expression NN 10_1101-2021_01_02_425006 106 38 level level NN 10_1101-2021_01_02_425006 106 39 of of IN 10_1101-2021_01_02_425006 106 40 the the DT 10_1101-2021_01_02_425006 106 41 ATUs atu NNS 10_1101-2021_01_02_425006 106 42 containing contain VBG 10_1101-2021_01_02_425006 106 43 it -PRON- PRP 10_1101-2021_01_02_425006 106 44 is be VBZ 10_1101-2021_01_02_425006 106 45 denoted denote VBN 10_1101-2021_01_02_425006 106 46 as as IN 10_1101-2021_01_02_425006 106 47 � � NNP 10_1101-2021_01_02_425006 106 48 � � NNP 10_1101-2021_01_02_425006 106 49 , , , 10_1101-2021_01_02_425006 106 50 providing provide VBG 10_1101-2021_01_02_425006 106 51 the the DT 10_1101-2021_01_02_425006 106 52 last last JJ 10_1101-2021_01_02_425006 106 53 � � NNP 10_1101-2021_01_02_425006 106 54 − − NNP 10_1101-2021_01_02_425006 106 55 1 1 CD 10_1101-2021_01_02_425006 106 56 equality equality NN 10_1101-2021_01_02_425006 106 57 172 172 CD 10_1101-2021_01_02_425006 106 58 constraints constraint NNS 10_1101-2021_01_02_425006 106 59 in in IN 10_1101-2021_01_02_425006 106 60 our -PRON- PRP$ 10_1101-2021_01_02_425006 106 61 mathematical mathematical JJ 10_1101-2021_01_02_425006 106 62 programming programming NN 10_1101-2021_01_02_425006 106 63 model model NN 10_1101-2021_01_02_425006 106 64 , , , 10_1101-2021_01_02_425006 106 65 � � NNP 10_1101-2021_01_02_425006 106 66 = = SYM 10_1101-2021_01_02_425006 106 67 1,2 1,2 CD 10_1101-2021_01_02_425006 106 68 , , , 10_1101-2021_01_02_425006 106 69 … … NFP 10_1101-2021_01_02_425006 106 70 , , , 10_1101-2021_01_02_425006 106 71 � � NNP 10_1101-2021_01_02_425006 106 72 − − NNP 10_1101-2021_01_02_425006 106 73 1 1 CD 10_1101-2021_01_02_425006 106 74 . . . 10_1101-2021_01_02_425006 107 1 173 173 CD 10_1101-2021_01_02_425006 107 2 The the DT 10_1101-2021_01_02_425006 107 3 goal goal NN 10_1101-2021_01_02_425006 107 4 of of IN 10_1101-2021_01_02_425006 107 5 our -PRON- PRP$ 10_1101-2021_01_02_425006 107 6 mathematical mathematical JJ 10_1101-2021_01_02_425006 107 7 programming programming NN 10_1101-2021_01_02_425006 107 8 model model NN 10_1101-2021_01_02_425006 107 9 is be VBZ 10_1101-2021_01_02_425006 107 10 to to TO 10_1101-2021_01_02_425006 107 11 minimize minimize VB 10_1101-2021_01_02_425006 107 12 the the DT 10_1101-2021_01_02_425006 107 13 square square NN 10_1101-2021_01_02_425006 107 14 of of IN 10_1101-2021_01_02_425006 107 15 � � NNP 10_1101-2021_01_02_425006 107 16 = = SYM 10_1101-2021_01_02_425006 107 17 174 174 CD 10_1101-2021_01_02_425006 107 18 ( ( -LRB- 10_1101-2021_01_02_425006 107 19 � � NNP 10_1101-2021_01_02_425006 107 20 � � NNP 10_1101-2021_01_02_425006 107 21 , , , 10_1101-2021_01_02_425006 107 22 � � NNP 10_1101-2021_01_02_425006 107 23 � � NNP 10_1101-2021_01_02_425006 107 24 , , , 10_1101-2021_01_02_425006 107 25 … … NFP 10_1101-2021_01_02_425006 107 26 , , , 10_1101-2021_01_02_425006 107 27 � � NNP 10_1101-2021_01_02_425006 107 28 � � NNP 10_1101-2021_01_02_425006 107 29 , , , 10_1101-2021_01_02_425006 107 30 � � NNP 10_1101-2021_01_02_425006 107 31 � � NNP 10_1101-2021_01_02_425006 107 32 , , , 10_1101-2021_01_02_425006 107 33 … … NFP 10_1101-2021_01_02_425006 107 34 , , , 10_1101-2021_01_02_425006 107 35 � � NNP 10_1101-2021_01_02_425006 107 36 � � ADD 10_1101-2021_01_02_425006 107 37 � � NNP 10_1101-2021_01_02_425006 107 38 � � NNP 10_1101-2021_01_02_425006 107 39 ) ) -RRB- 10_1101-2021_01_02_425006 107 40 , , , 10_1101-2021_01_02_425006 107 41 as as IN 10_1101-2021_01_02_425006 107 42 the the DT 10_1101-2021_01_02_425006 107 43 combination combination NN 10_1101-2021_01_02_425006 107 44 of of IN 10_1101-2021_01_02_425006 107 45 � � NNP 10_1101-2021_01_02_425006 107 46 � � NNP 10_1101-2021_01_02_425006 107 47 , , , 10_1101-2021_01_02_425006 107 48 � � NNP 10_1101-2021_01_02_425006 107 49 with with IN 10_1101-2021_01_02_425006 107 50 a a DT 10_1101-2021_01_02_425006 107 51 minimal minimal JJ 10_1101-2021_01_02_425006 107 52 value value NN 10_1101-2021_01_02_425006 107 53 of of IN 10_1101-2021_01_02_425006 107 54 � � NNP 10_1101-2021_01_02_425006 107 55 � � NNP 10_1101-2021_01_02_425006 107 56 � � NNP 10_1101-2021_01_02_425006 107 57 is be VBZ 10_1101-2021_01_02_425006 107 58 corresponding correspond VBG 10_1101-2021_01_02_425006 107 59 to to IN 10_1101-2021_01_02_425006 107 60 175 175 CD 10_1101-2021_01_02_425006 107 61 an an DT 10_1101-2021_01_02_425006 107 62 optimum optimum JJ 10_1101-2021_01_02_425006 107 63 expression expression NN 10_1101-2021_01_02_425006 107 64 combination combination NN 10_1101-2021_01_02_425006 107 65 of of IN 10_1101-2021_01_02_425006 107 66 all all DT 10_1101-2021_01_02_425006 107 67 ATUs ATUs NNPS 10_1101-2021_01_02_425006 107 68 � � NNPS 10_1101-2021_01_02_425006 107 69 � � NNP 10_1101-2021_01_02_425006 107 70 , , , 10_1101-2021_01_02_425006 107 71 � � NNP 10_1101-2021_01_02_425006 107 72 for for IN 10_1101-2021_01_02_425006 107 73 � � NNP 10_1101-2021_01_02_425006 107 74 , , , 10_1101-2021_01_02_425006 107 75 1 1 CD 10_1101-2021_01_02_425006 107 76 ≤ ≤ NN 10_1101-2021_01_02_425006 107 77 � � NNP 10_1101-2021_01_02_425006 107 78 ≤ ≤ NNP 10_1101-2021_01_02_425006 107 79 � � NNP 10_1101-2021_01_02_425006 107 80 ≤ ≤ NN 10_1101-2021_01_02_425006 107 81 � � NNP 10_1101-2021_01_02_425006 107 82 . . . 10_1101-2021_01_02_425006 108 1 Additionally additionally RB 10_1101-2021_01_02_425006 108 2 , , , 10_1101-2021_01_02_425006 108 3 to to TO 10_1101-2021_01_02_425006 108 4 control control VB 10_1101-2021_01_02_425006 108 5 the the DT 10_1101-2021_01_02_425006 108 6 176 176 CD 10_1101-2021_01_02_425006 108 7 .CC .CC : 10_1101-2021_01_02_425006 108 8 - - HYPH 10_1101-2021_01_02_425006 108 9 BY by IN 10_1101-2021_01_02_425006 108 10 - - HYPH 10_1101-2021_01_02_425006 108 11 NC NC NNP 10_1101-2021_01_02_425006 108 12 - - HYPH 10_1101-2021_01_02_425006 108 13 ND ND NNP 10_1101-2021_01_02_425006 108 14 4.0 4.0 CD 10_1101-2021_01_02_425006 108 15 International International NNP 10_1101-2021_01_02_425006 108 16 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 108 17 under under IN 10_1101-2021_01_02_425006 108 18 a a DT 10_1101-2021_01_02_425006 108 19 ( ( -LRB- 10_1101-2021_01_02_425006 108 20 which which WDT 10_1101-2021_01_02_425006 108 21 was be VBD 10_1101-2021_01_02_425006 108 22 not not RB 10_1101-2021_01_02_425006 108 23 certified certify VBN 10_1101-2021_01_02_425006 108 24 by by IN 10_1101-2021_01_02_425006 108 25 peer peer NN 10_1101-2021_01_02_425006 108 26 review review NN 10_1101-2021_01_02_425006 108 27 ) ) -RRB- 10_1101-2021_01_02_425006 108 28 is be VBZ 10_1101-2021_01_02_425006 108 29 the the DT 10_1101-2021_01_02_425006 108 30 author author NN 10_1101-2021_01_02_425006 108 31 / / SYM 10_1101-2021_01_02_425006 108 32 funder funder NN 10_1101-2021_01_02_425006 108 33 , , , 10_1101-2021_01_02_425006 108 34 who who WP 10_1101-2021_01_02_425006 108 35 has have VBZ 10_1101-2021_01_02_425006 108 36 granted grant VBN 10_1101-2021_01_02_425006 108 37 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 108 38 a a DT 10_1101-2021_01_02_425006 108 39 license license NN 10_1101-2021_01_02_425006 108 40 to to TO 10_1101-2021_01_02_425006 108 41 display display VB 10_1101-2021_01_02_425006 108 42 the the DT 10_1101-2021_01_02_425006 108 43 preprint preprint NN 10_1101-2021_01_02_425006 108 44 in in IN 10_1101-2021_01_02_425006 108 45 perpetuity perpetuity NN 10_1101-2021_01_02_425006 108 46 . . . 10_1101-2021_01_02_425006 109 1 It -PRON- PRP 10_1101-2021_01_02_425006 109 2 is be VBZ 10_1101-2021_01_02_425006 109 3 made make VBN 10_1101-2021_01_02_425006 109 4 The the DT 10_1101-2021_01_02_425006 109 5 copyright copyright NN 10_1101-2021_01_02_425006 109 6 holder holder NN 10_1101-2021_01_02_425006 109 7 for for IN 10_1101-2021_01_02_425006 109 8 this this DT 10_1101-2021_01_02_425006 109 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 109 10 version version NN 10_1101-2021_01_02_425006 109 11 posted post VBD 10_1101-2021_01_02_425006 109 12 January January NNP 10_1101-2021_01_02_425006 109 13 6 6 CD 10_1101-2021_01_02_425006 109 14 , , , 10_1101-2021_01_02_425006 109 15 2021 2021 CD 10_1101-2021_01_02_425006 109 16 . . . 10_1101-2021_01_02_425006 109 17 ; ; : 10_1101-2021_01_02_425006 109 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 109 19 : : : 10_1101-2021_01_02_425006 109 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 109 21 preprint preprint NN 10_1101-2021_01_02_425006 109 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 109 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 109 24 11 11 CD 10_1101-2021_01_02_425006 109 25 number number NN 10_1101-2021_01_02_425006 109 26 of of IN 10_1101-2021_01_02_425006 109 27 optimal optimal JJ 10_1101-2021_01_02_425006 109 28 solutions solution NNS 10_1101-2021_01_02_425006 109 29 and and CC 10_1101-2021_01_02_425006 109 30 reduce reduce VB 10_1101-2021_01_02_425006 109 31 the the DT 10_1101-2021_01_02_425006 109 32 false false JJ 10_1101-2021_01_02_425006 109 33 - - HYPH 10_1101-2021_01_02_425006 109 34 positive positive JJ 10_1101-2021_01_02_425006 109 35 errors error NNS 10_1101-2021_01_02_425006 109 36 , , , 10_1101-2021_01_02_425006 109 37 we -PRON- PRP 10_1101-2021_01_02_425006 109 38 added add VBD 10_1101-2021_01_02_425006 109 39 an an DT 10_1101-2021_01_02_425006 109 40 � � NNP 10_1101-2021_01_02_425006 109 41 � � JJ 10_1101-2021_01_02_425006 109 42 regularization regularization NN 10_1101-2021_01_02_425006 109 43 � � . 10_1101-2021_01_02_425006 109 44 || || NNP 10_1101-2021_01_02_425006 109 45 � � JJ 10_1101-2021_01_02_425006 109 46 || || ADD 10_1101-2021_01_02_425006 109 47 � � NNP 10_1101-2021_01_02_425006 109 48 177 177 CD 10_1101-2021_01_02_425006 109 49 to to IN 10_1101-2021_01_02_425006 109 50 � � ADD 10_1101-2021_01_02_425006 109 51 � � NNP 10_1101-2021_01_02_425006 109 52 � � NNP 10_1101-2021_01_02_425006 109 53 with with IN 10_1101-2021_01_02_425006 109 54 � � NNP 10_1101-2021_01_02_425006 109 55 � � NNP 10_1101-2021_01_02_425006 109 56 , , , 10_1101-2021_01_02_425006 109 57 � � NNP 10_1101-2021_01_02_425006 109 58 ≥ ≥ NN 10_1101-2021_01_02_425006 109 59 0 0 CD 10_1101-2021_01_02_425006 109 60 , , , 10_1101-2021_01_02_425006 109 61 which which WDT 10_1101-2021_01_02_425006 109 62 is be VBZ 10_1101-2021_01_02_425006 109 63 a a DT 10_1101-2021_01_02_425006 109 64 linear linear JJ 10_1101-2021_01_02_425006 109 65 function function NN 10_1101-2021_01_02_425006 109 66 . . . 10_1101-2021_01_02_425006 110 1 Because because IN 10_1101-2021_01_02_425006 110 2 of of IN 10_1101-2021_01_02_425006 110 3 the the DT 10_1101-2021_01_02_425006 110 4 variant variant JJ 10_1101-2021_01_02_425006 110 5 expression expression NN 10_1101-2021_01_02_425006 110 6 level level NN 10_1101-2021_01_02_425006 110 7 of of IN 10_1101-2021_01_02_425006 110 8 different different JJ 10_1101-2021_01_02_425006 110 9 178 178 CD 10_1101-2021_01_02_425006 110 10 maximal maximal JJ 10_1101-2021_01_02_425006 110 11 ATU ATU NNP 10_1101-2021_01_02_425006 110 12 clusters cluster NNS 10_1101-2021_01_02_425006 110 13 , , , 10_1101-2021_01_02_425006 110 14 we -PRON- PRP 10_1101-2021_01_02_425006 110 15 used use VBD 10_1101-2021_01_02_425006 110 16 the the DT 10_1101-2021_01_02_425006 110 17 expression expression NN 10_1101-2021_01_02_425006 110 18 value value NN 10_1101-2021_01_02_425006 110 19 of of IN 10_1101-2021_01_02_425006 110 20 � � NNP 10_1101-2021_01_02_425006 110 21 as as IN 10_1101-2021_01_02_425006 110 22 � � NNP 10_1101-2021_01_02_425006 110 23 . . . 10_1101-2021_01_02_425006 111 1 In in IN 10_1101-2021_01_02_425006 111 2 total total NN 10_1101-2021_01_02_425006 111 3 , , , 10_1101-2021_01_02_425006 111 4 the the DT 10_1101-2021_01_02_425006 111 5 convex convex NNP 10_1101-2021_01_02_425006 111 6 quadratic quadratic NNP 10_1101-2021_01_02_425006 111 7 179 179 CD 10_1101-2021_01_02_425006 111 8 programming programming NN 10_1101-2021_01_02_425006 111 9 model model NN 10_1101-2021_01_02_425006 111 10 with with IN 10_1101-2021_01_02_425006 111 11 unknown unknown JJ 10_1101-2021_01_02_425006 111 12 variables variable NNS 10_1101-2021_01_02_425006 111 13 ( ( -LRB- 10_1101-2021_01_02_425006 111 14 � � NNP 10_1101-2021_01_02_425006 111 15 , , , 10_1101-2021_01_02_425006 111 16 � � NNP 10_1101-2021_01_02_425006 111 17 ) ) -RRB- 10_1101-2021_01_02_425006 111 18 is be VBZ 10_1101-2021_01_02_425006 111 19 shown show VBN 10_1101-2021_01_02_425006 111 20 as as IN 10_1101-2021_01_02_425006 111 21 follows follow VBZ 10_1101-2021_01_02_425006 111 22 : : : 10_1101-2021_01_02_425006 111 23 180 180 CD 10_1101-2021_01_02_425006 111 24 � � NNS 10_1101-2021_01_02_425006 111 25 � � , 10_1101-2021_01_02_425006 111 26 � � JJ 10_1101-2021_01_02_425006 111 27 � � JJ 10_1101-2021_01_02_425006 111 28 � � NNS 10_1101-2021_01_02_425006 111 29 � � NN 10_1101-2021_01_02_425006 111 30 + + SYM 10_1101-2021_01_02_425006 111 31 � � NNS 10_1101-2021_01_02_425006 111 32 || || NNP 10_1101-2021_01_02_425006 111 33 � � , 10_1101-2021_01_02_425006 111 34 || || ADD 10_1101-2021_01_02_425006 111 35 � � NNP 10_1101-2021_01_02_425006 111 36 � � NNP 10_1101-2021_01_02_425006 111 37 . . . 10_1101-2021_01_02_425006 112 1 � � NNP 10_1101-2021_01_02_425006 112 2 . . . 10_1101-2021_01_02_425006 113 1 ∑ ∑ NFP 10_1101-2021_01_02_425006 113 2 ∑ ∑ . 10_1101-2021_01_02_425006 113 3 � � NNP 10_1101-2021_01_02_425006 113 4 � � NNP 10_1101-2021_01_02_425006 113 5 , , , 10_1101-2021_01_02_425006 113 6 � � NNP 10_1101-2021_01_02_425006 113 7 � � NNP 10_1101-2021_01_02_425006 113 8 � � NNP 10_1101-2021_01_02_425006 113 9 , , , 10_1101-2021_01_02_425006 113 10 � � NNP 10_1101-2021_01_02_425006 113 11 � � ADD 10_1101-2021_01_02_425006 113 12 � � VBZ 10_1101-2021_01_02_425006 113 13 � � VBZ 10_1101-2021_01_02_425006 113 14 � � VBZ 10_1101-2021_01_02_425006 113 15 � � VBZ 10_1101-2021_01_02_425006 113 16 � � JJ 10_1101-2021_01_02_425006 113 17 � � NNS 10_1101-2021_01_02_425006 113 18 � � NNP 10_1101-2021_01_02_425006 113 19 = = VBZ 10_1101-2021_01_02_425006 113 20 � � NNP 10_1101-2021_01_02_425006 113 21 � � NNP 10_1101-2021_01_02_425006 113 22 + + SYM 10_1101-2021_01_02_425006 113 23 � � NNS 10_1101-2021_01_02_425006 113 24 � � NNP 10_1101-2021_01_02_425006 113 25 � � NNS 10_1101-2021_01_02_425006 113 26 = = SYM 10_1101-2021_01_02_425006 113 27 1,2 1,2 CD 10_1101-2021_01_02_425006 113 28 , , , 10_1101-2021_01_02_425006 113 29 … … NFP 10_1101-2021_01_02_425006 113 30 , , , 10_1101-2021_01_02_425006 113 31 � � NNP 10_1101-2021_01_02_425006 113 32 ∑ ∑ . 10_1101-2021_01_02_425006 113 33 ∑ ∑ . 10_1101-2021_01_02_425006 113 34 � � NNP 10_1101-2021_01_02_425006 113 35 � � NNP 10_1101-2021_01_02_425006 113 36 , , , 10_1101-2021_01_02_425006 113 37 � � NNP 10_1101-2021_01_02_425006 113 38 � � ADD 10_1101-2021_01_02_425006 113 39 � � JJ 10_1101-2021_01_02_425006 113 40 � � NNP 10_1101-2021_01_02_425006 113 41 � � NNP 10_1101-2021_01_02_425006 113 42 , , , 10_1101-2021_01_02_425006 113 43 � � NNP 10_1101-2021_01_02_425006 113 44 � � ADD 10_1101-2021_01_02_425006 113 45 � � VBZ 10_1101-2021_01_02_425006 113 46 � � VBZ 10_1101-2021_01_02_425006 113 47 � � VBZ 10_1101-2021_01_02_425006 113 48 � � VBZ 10_1101-2021_01_02_425006 113 49 � � VBZ 10_1101-2021_01_02_425006 113 50 � � VBZ 10_1101-2021_01_02_425006 113 51 � � JJ 10_1101-2021_01_02_425006 113 52 � � NNS 10_1101-2021_01_02_425006 113 53 � � NNP 10_1101-2021_01_02_425006 113 54 = = VBZ 10_1101-2021_01_02_425006 113 55 � � NNP 10_1101-2021_01_02_425006 113 56 � � NNP 10_1101-2021_01_02_425006 113 57 , , , 10_1101-2021_01_02_425006 113 58 � � NNP 10_1101-2021_01_02_425006 113 59 � � NNP 10_1101-2021_01_02_425006 113 60 � � NN 10_1101-2021_01_02_425006 113 61 + + CC 10_1101-2021_01_02_425006 113 62 � � NNS 10_1101-2021_01_02_425006 113 63 � � NNP 10_1101-2021_01_02_425006 113 64 � � NNS 10_1101-2021_01_02_425006 113 65 = = SYM 10_1101-2021_01_02_425006 113 66 1,2 1,2 CD 10_1101-2021_01_02_425006 113 67 , , , 10_1101-2021_01_02_425006 113 68 … … NFP 10_1101-2021_01_02_425006 113 69 , , , 10_1101-2021_01_02_425006 113 70 � � NNP 10_1101-2021_01_02_425006 113 71 − − NNP 10_1101-2021_01_02_425006 113 72 1 1 CD 10_1101-2021_01_02_425006 113 73 � � NNS 10_1101-2021_01_02_425006 113 74 = = SYM 10_1101-2021_01_02_425006 113 75 � � NNP 10_1101-2021_01_02_425006 113 76 � � NNP 10_1101-2021_01_02_425006 113 77 � � NNP 10_1101-2021_01_02_425006 113 78 , , , 10_1101-2021_01_02_425006 113 79 � � NNP 10_1101-2021_01_02_425006 113 80 � � NNP 10_1101-2021_01_02_425006 113 81 , , , 10_1101-2021_01_02_425006 113 82 � � NNP 10_1101-2021_01_02_425006 113 83 � � NNP 10_1101-2021_01_02_425006 113 84 , , , 10_1101-2021_01_02_425006 113 85 � � NNP 10_1101-2021_01_02_425006 113 86 ≥ ≥ CD 10_1101-2021_01_02_425006 113 87 0 0 CD 10_1101-2021_01_02_425006 113 88 1 1 CD 10_1101-2021_01_02_425006 113 89 ≤ ≤ NN 10_1101-2021_01_02_425006 113 90 � � NNP 10_1101-2021_01_02_425006 113 91 ≤ ≤ NNP 10_1101-2021_01_02_425006 113 92 � � NNP 10_1101-2021_01_02_425006 113 93 ≤ ≤ NN 10_1101-2021_01_02_425006 113 94 � � NNS 10_1101-2021_01_02_425006 113 95 � � ADD 10_1101-2021_01_02_425006 113 96 = = NFP 10_1101-2021_01_02_425006 113 97 ( ( -LRB- 10_1101-2021_01_02_425006 113 98 � � NNP 10_1101-2021_01_02_425006 113 99 � � NNP 10_1101-2021_01_02_425006 113 100 , , , 10_1101-2021_01_02_425006 113 101 � � NNP 10_1101-2021_01_02_425006 113 102 � � NNP 10_1101-2021_01_02_425006 113 103 , , , 10_1101-2021_01_02_425006 113 104 … … NFP 10_1101-2021_01_02_425006 113 105 , , , 10_1101-2021_01_02_425006 113 106 � � NNP 10_1101-2021_01_02_425006 113 107 � � NNP 10_1101-2021_01_02_425006 113 108 , , , 10_1101-2021_01_02_425006 113 109 � � NNP 10_1101-2021_01_02_425006 113 110 � � NNP 10_1101-2021_01_02_425006 113 111 , , , 10_1101-2021_01_02_425006 113 112 … … NFP 10_1101-2021_01_02_425006 113 113 , , , 10_1101-2021_01_02_425006 113 114 � � NNP 10_1101-2021_01_02_425006 113 115 � � ADD 10_1101-2021_01_02_425006 113 116 � � NNP 10_1101-2021_01_02_425006 113 117 � � NNP 10_1101-2021_01_02_425006 113 118 ) ) -RRB- 10_1101-2021_01_02_425006 113 119 ( ( -LRB- 10_1101-2021_01_02_425006 113 120 6 6 CD 10_1101-2021_01_02_425006 113 121 ) ) -RRB- 10_1101-2021_01_02_425006 113 122 where where WRB 10_1101-2021_01_02_425006 113 123 � � NNP 10_1101-2021_01_02_425006 113 124 = = NFP 10_1101-2021_01_02_425006 113 125 ( ( -LRB- 10_1101-2021_01_02_425006 113 126 � � NNP 10_1101-2021_01_02_425006 113 127 � � NNP 10_1101-2021_01_02_425006 113 128 , , , 10_1101-2021_01_02_425006 113 129 � � NNP 10_1101-2021_01_02_425006 113 130 ) ) -RRB- 10_1101-2021_01_02_425006 113 131 is be VBZ 10_1101-2021_01_02_425006 113 132 the the DT 10_1101-2021_01_02_425006 113 133 genetic genetic JJ 10_1101-2021_01_02_425006 113 134 region region NN 10_1101-2021_01_02_425006 113 135 bias bias NN 10_1101-2021_01_02_425006 113 136 rate rate NN 10_1101-2021_01_02_425006 113 137 vector vector NN 10_1101-2021_01_02_425006 113 138 for for IN 10_1101-2021_01_02_425006 113 139 � � NNP 10_1101-2021_01_02_425006 113 140 , , , 10_1101-2021_01_02_425006 113 141 � � NNP 10_1101-2021_01_02_425006 113 142 � � NNP 10_1101-2021_01_02_425006 113 143 , , , 10_1101-2021_01_02_425006 113 144 � � NNP 10_1101-2021_01_02_425006 113 145 is be VBZ 10_1101-2021_01_02_425006 113 146 the the DT 10_1101-2021_01_02_425006 113 147 bias bias NN 10_1101-2021_01_02_425006 113 148 rate rate NN 10_1101-2021_01_02_425006 113 149 of of IN 10_1101-2021_01_02_425006 113 150 gene gene NN 10_1101-2021_01_02_425006 113 151 � � NNP 10_1101-2021_01_02_425006 113 152 � � NNP 10_1101-2021_01_02_425006 113 153 for for IN 10_1101-2021_01_02_425006 113 154 ATU ATU NNP 10_1101-2021_01_02_425006 113 155 181 181 CD 10_1101-2021_01_02_425006 113 156 � � NNP 10_1101-2021_01_02_425006 113 157 � � NNP 10_1101-2021_01_02_425006 113 158 , , , 10_1101-2021_01_02_425006 113 159 � � NNP 10_1101-2021_01_02_425006 113 160 , , , 10_1101-2021_01_02_425006 113 161 1 1 CD 10_1101-2021_01_02_425006 113 162 ≤ ≤ NN 10_1101-2021_01_02_425006 113 163 � � NNP 10_1101-2021_01_02_425006 113 164 ≤ ≤ NNP 10_1101-2021_01_02_425006 113 165 � � NNP 10_1101-2021_01_02_425006 113 166 ≤ ≤ NN 10_1101-2021_01_02_425006 113 167 � � NNP 10_1101-2021_01_02_425006 113 168 , , , 10_1101-2021_01_02_425006 113 169 � � NNP 10_1101-2021_01_02_425006 113 170 ≤ ≤ CC 10_1101-2021_01_02_425006 113 171 � � NNP 10_1101-2021_01_02_425006 113 172 ≤ ≤ NNP 10_1101-2021_01_02_425006 113 173 � � NNP 10_1101-2021_01_02_425006 113 174 , , , 10_1101-2021_01_02_425006 113 175 � � NNP 10_1101-2021_01_02_425006 113 176 = = NFP 10_1101-2021_01_02_425006 113 177 ( ( -LRB- 10_1101-2021_01_02_425006 113 178 � � NNP 10_1101-2021_01_02_425006 113 179 � � NNP 10_1101-2021_01_02_425006 113 180 , , , 10_1101-2021_01_02_425006 113 181 � � NNP 10_1101-2021_01_02_425006 113 182 ) ) -RRB- 10_1101-2021_01_02_425006 113 183 is be VBZ 10_1101-2021_01_02_425006 113 184 the the DT 10_1101-2021_01_02_425006 113 185 intergenic intergenic JJ 10_1101-2021_01_02_425006 113 186 region region NN 10_1101-2021_01_02_425006 113 187 bias bias NN 10_1101-2021_01_02_425006 113 188 rate rate NN 10_1101-2021_01_02_425006 113 189 vector vector NN 10_1101-2021_01_02_425006 113 190 for for IN 10_1101-2021_01_02_425006 113 191 � � NNP 10_1101-2021_01_02_425006 113 192 , , , 10_1101-2021_01_02_425006 113 193 and and CC 10_1101-2021_01_02_425006 113 194 � � NNP 10_1101-2021_01_02_425006 113 195 � � NNP 10_1101-2021_01_02_425006 113 196 , , , 10_1101-2021_01_02_425006 113 197 � � NNP 10_1101-2021_01_02_425006 113 198 182 182 CD 10_1101-2021_01_02_425006 113 199 is be VBZ 10_1101-2021_01_02_425006 113 200 the the DT 10_1101-2021_01_02_425006 113 201 bias bias NN 10_1101-2021_01_02_425006 113 202 rate rate NN 10_1101-2021_01_02_425006 113 203 of of IN 10_1101-2021_01_02_425006 113 204 the the DT 10_1101-2021_01_02_425006 113 205 intergenic intergenic JJ 10_1101-2021_01_02_425006 113 206 region region NN 10_1101-2021_01_02_425006 113 207 � � . 10_1101-2021_01_02_425006 113 208 � � JJ 10_1101-2021_01_02_425006 113 209 � � NNP 10_1101-2021_01_02_425006 113 210 � � NNP 10_1101-2021_01_02_425006 113 211 , , , 10_1101-2021_01_02_425006 113 212 � � NNP 10_1101-2021_01_02_425006 113 213 for for IN 10_1101-2021_01_02_425006 113 214 ATU ATU NNP 10_1101-2021_01_02_425006 113 215 � � NNP 10_1101-2021_01_02_425006 113 216 � � NNP 10_1101-2021_01_02_425006 113 217 , , , 10_1101-2021_01_02_425006 113 218 � � NNP 10_1101-2021_01_02_425006 113 219 , , , 10_1101-2021_01_02_425006 113 220 1 1 CD 10_1101-2021_01_02_425006 113 221 ≤ ≤ NN 10_1101-2021_01_02_425006 113 222 � � NNP 10_1101-2021_01_02_425006 113 223 < < XX 10_1101-2021_01_02_425006 113 224 � � NNP 10_1101-2021_01_02_425006 113 225 ≤ ≤ JJ 10_1101-2021_01_02_425006 113 226 � � NNP 10_1101-2021_01_02_425006 113 227 , , , 10_1101-2021_01_02_425006 113 228 � � NNP 10_1101-2021_01_02_425006 113 229 ≤ ≤ CC 10_1101-2021_01_02_425006 113 230 � � NNP 10_1101-2021_01_02_425006 113 231 ≤ ≤ NN 10_1101-2021_01_02_425006 113 232 � � NNP 10_1101-2021_01_02_425006 113 233 ( ( -LRB- 10_1101-2021_01_02_425006 113 234 see see VB 10_1101-2021_01_02_425006 113 235 the the DT 10_1101-2021_01_02_425006 113 236 183 183 CD 10_1101-2021_01_02_425006 113 237 details detail NNS 10_1101-2021_01_02_425006 113 238 in in IN 10_1101-2021_01_02_425006 113 239 method method NNP 10_1101-2021_01_02_425006 113 240 S5 S5 NNP 10_1101-2021_01_02_425006 113 241 ) ) -RRB- 10_1101-2021_01_02_425006 113 242 . . . 10_1101-2021_01_02_425006 114 1 184 184 CD 10_1101-2021_01_02_425006 114 2 Two two CD 10_1101-2021_01_02_425006 114 3 evaluation evaluation NN 10_1101-2021_01_02_425006 114 4 methods method NNS 10_1101-2021_01_02_425006 114 5 for for IN 10_1101-2021_01_02_425006 114 6 ATU ATU NNP 10_1101-2021_01_02_425006 114 7 prediction prediction NN 10_1101-2021_01_02_425006 114 8 185 185 CD 10_1101-2021_01_02_425006 114 9 In in IN 10_1101-2021_01_02_425006 114 10 the the DT 10_1101-2021_01_02_425006 114 11 first first JJ 10_1101-2021_01_02_425006 114 12 evaluation evaluation NN 10_1101-2021_01_02_425006 114 13 method method NN 10_1101-2021_01_02_425006 114 14 , , , 10_1101-2021_01_02_425006 114 15 precision precision NN 10_1101-2021_01_02_425006 114 16 and and CC 10_1101-2021_01_02_425006 114 17 recall recall NN 10_1101-2021_01_02_425006 114 18 were be VBD 10_1101-2021_01_02_425006 114 19 defined define VBN 10_1101-2021_01_02_425006 114 20 based base VBN 10_1101-2021_01_02_425006 114 21 on on IN 10_1101-2021_01_02_425006 114 22 perfect perfect JJ 10_1101-2021_01_02_425006 114 23 matching matching NN 10_1101-2021_01_02_425006 114 24 ( ( -LRB- 10_1101-2021_01_02_425006 114 25 Eqs Eqs NNP 10_1101-2021_01_02_425006 114 26 . . . 10_1101-2021_01_02_425006 115 1 7 7 LS 10_1101-2021_01_02_425006 115 2 ) ) -RRB- 10_1101-2021_01_02_425006 115 3 . . . 10_1101-2021_01_02_425006 116 1 186 186 CD 10_1101-2021_01_02_425006 116 2 Perfect perfect JJ 10_1101-2021_01_02_425006 116 3 matching matching NN 10_1101-2021_01_02_425006 116 4 of of IN 10_1101-2021_01_02_425006 116 5 two two CD 10_1101-2021_01_02_425006 116 6 ATUs atu NNS 10_1101-2021_01_02_425006 116 7 means mean VBZ 10_1101-2021_01_02_425006 116 8 that that IN 10_1101-2021_01_02_425006 116 9 all all DT 10_1101-2021_01_02_425006 116 10 of of IN 10_1101-2021_01_02_425006 116 11 their -PRON- PRP$ 10_1101-2021_01_02_425006 116 12 component component NN 10_1101-2021_01_02_425006 116 13 genes gene NNS 10_1101-2021_01_02_425006 116 14 are be VBP 10_1101-2021_01_02_425006 116 15 the the DT 10_1101-2021_01_02_425006 116 16 same same JJ 10_1101-2021_01_02_425006 116 17 . . . 10_1101-2021_01_02_425006 117 1 Here here RB 10_1101-2021_01_02_425006 117 2 , , , 10_1101-2021_01_02_425006 117 3 the the DT 10_1101-2021_01_02_425006 117 4 true true JJ 10_1101-2021_01_02_425006 117 5 187 187 CD 10_1101-2021_01_02_425006 117 6 positives positive NNS 10_1101-2021_01_02_425006 117 7 ( ( -LRB- 10_1101-2021_01_02_425006 117 8 � � NNP 10_1101-2021_01_02_425006 117 9 � � NNP 10_1101-2021_01_02_425006 117 10 ) ) -RRB- 10_1101-2021_01_02_425006 117 11 are be VBP 10_1101-2021_01_02_425006 117 12 the the DT 10_1101-2021_01_02_425006 117 13 number number NN 10_1101-2021_01_02_425006 117 14 of of IN 10_1101-2021_01_02_425006 117 15 predicted predict VBN 10_1101-2021_01_02_425006 117 16 ATUs atu NNS 10_1101-2021_01_02_425006 117 17 with with IN 10_1101-2021_01_02_425006 117 18 the the DT 10_1101-2021_01_02_425006 117 19 same same JJ 10_1101-2021_01_02_425006 117 20 component component NN 10_1101-2021_01_02_425006 117 21 genes gene NNS 10_1101-2021_01_02_425006 117 22 as as IN 10_1101-2021_01_02_425006 117 23 an an DT 10_1101-2021_01_02_425006 117 24 ATU ATU NNP 10_1101-2021_01_02_425006 117 25 in in IN 10_1101-2021_01_02_425006 117 26 the the DT 10_1101-2021_01_02_425006 117 27 188 188 CD 10_1101-2021_01_02_425006 117 28 evaluation evaluation NN 10_1101-2021_01_02_425006 117 29 data datum NNS 10_1101-2021_01_02_425006 117 30 ; ; : 10_1101-2021_01_02_425006 117 31 the the DT 10_1101-2021_01_02_425006 117 32 false false JJ 10_1101-2021_01_02_425006 117 33 positives positive NNS 10_1101-2021_01_02_425006 117 34 ( ( -LRB- 10_1101-2021_01_02_425006 117 35 � � NNP 10_1101-2021_01_02_425006 117 36 � � NNP 10_1101-2021_01_02_425006 117 37 ) ) -RRB- 10_1101-2021_01_02_425006 117 38 are be VBP 10_1101-2021_01_02_425006 117 39 the the DT 10_1101-2021_01_02_425006 117 40 number number NN 10_1101-2021_01_02_425006 117 41 of of IN 10_1101-2021_01_02_425006 117 42 predicted predict VBN 10_1101-2021_01_02_425006 117 43 ATUs atu NNS 10_1101-2021_01_02_425006 117 44 that that WDT 10_1101-2021_01_02_425006 117 45 do do VBP 10_1101-2021_01_02_425006 117 46 not not RB 10_1101-2021_01_02_425006 117 47 exist exist VB 10_1101-2021_01_02_425006 117 48 in in IN 10_1101-2021_01_02_425006 117 49 the the DT 10_1101-2021_01_02_425006 117 50 189 189 CD 10_1101-2021_01_02_425006 117 51 .CC .CC NFP 10_1101-2021_01_02_425006 117 52 - - HYPH 10_1101-2021_01_02_425006 117 53 BY by IN 10_1101-2021_01_02_425006 117 54 - - HYPH 10_1101-2021_01_02_425006 117 55 NC NC NNP 10_1101-2021_01_02_425006 117 56 - - HYPH 10_1101-2021_01_02_425006 117 57 ND ND NNP 10_1101-2021_01_02_425006 117 58 4.0 4.0 CD 10_1101-2021_01_02_425006 117 59 International International NNP 10_1101-2021_01_02_425006 117 60 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 117 61 under under IN 10_1101-2021_01_02_425006 117 62 a a DT 10_1101-2021_01_02_425006 117 63 ( ( -LRB- 10_1101-2021_01_02_425006 117 64 which which WDT 10_1101-2021_01_02_425006 117 65 was be VBD 10_1101-2021_01_02_425006 117 66 not not RB 10_1101-2021_01_02_425006 117 67 certified certify VBN 10_1101-2021_01_02_425006 117 68 by by IN 10_1101-2021_01_02_425006 117 69 peer peer NN 10_1101-2021_01_02_425006 117 70 review review NN 10_1101-2021_01_02_425006 117 71 ) ) -RRB- 10_1101-2021_01_02_425006 117 72 is be VBZ 10_1101-2021_01_02_425006 117 73 the the DT 10_1101-2021_01_02_425006 117 74 author author NN 10_1101-2021_01_02_425006 117 75 / / SYM 10_1101-2021_01_02_425006 117 76 funder funder NN 10_1101-2021_01_02_425006 117 77 , , , 10_1101-2021_01_02_425006 117 78 who who WP 10_1101-2021_01_02_425006 117 79 has have VBZ 10_1101-2021_01_02_425006 117 80 granted grant VBN 10_1101-2021_01_02_425006 117 81 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 117 82 a a DT 10_1101-2021_01_02_425006 117 83 license license NN 10_1101-2021_01_02_425006 117 84 to to TO 10_1101-2021_01_02_425006 117 85 display display VB 10_1101-2021_01_02_425006 117 86 the the DT 10_1101-2021_01_02_425006 117 87 preprint preprint NN 10_1101-2021_01_02_425006 117 88 in in IN 10_1101-2021_01_02_425006 117 89 perpetuity perpetuity NN 10_1101-2021_01_02_425006 117 90 . . . 10_1101-2021_01_02_425006 118 1 It -PRON- PRP 10_1101-2021_01_02_425006 118 2 is be VBZ 10_1101-2021_01_02_425006 118 3 made make VBN 10_1101-2021_01_02_425006 118 4 The the DT 10_1101-2021_01_02_425006 118 5 copyright copyright NN 10_1101-2021_01_02_425006 118 6 holder holder NN 10_1101-2021_01_02_425006 118 7 for for IN 10_1101-2021_01_02_425006 118 8 this this DT 10_1101-2021_01_02_425006 118 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 118 10 version version NN 10_1101-2021_01_02_425006 118 11 posted post VBD 10_1101-2021_01_02_425006 118 12 January January NNP 10_1101-2021_01_02_425006 118 13 6 6 CD 10_1101-2021_01_02_425006 118 14 , , , 10_1101-2021_01_02_425006 118 15 2021 2021 CD 10_1101-2021_01_02_425006 118 16 . . . 10_1101-2021_01_02_425006 118 17 ; ; : 10_1101-2021_01_02_425006 118 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 118 19 : : : 10_1101-2021_01_02_425006 118 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 118 21 preprint preprint NN 10_1101-2021_01_02_425006 118 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 118 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 118 24 12 12 CD 10_1101-2021_01_02_425006 118 25 evaluation evaluation NN 10_1101-2021_01_02_425006 118 26 data datum NNS 10_1101-2021_01_02_425006 118 27 ; ; : 10_1101-2021_01_02_425006 118 28 the the DT 10_1101-2021_01_02_425006 118 29 false false JJ 10_1101-2021_01_02_425006 118 30 negatives negative NNS 10_1101-2021_01_02_425006 118 31 ( ( -LRB- 10_1101-2021_01_02_425006 118 32 � � NNP 10_1101-2021_01_02_425006 118 33 � � NNP 10_1101-2021_01_02_425006 118 34 ) ) -RRB- 10_1101-2021_01_02_425006 118 35 are be VBP 10_1101-2021_01_02_425006 118 36 the the DT 10_1101-2021_01_02_425006 118 37 number number NN 10_1101-2021_01_02_425006 118 38 of of IN 10_1101-2021_01_02_425006 118 39 ATUs atu NNS 10_1101-2021_01_02_425006 118 40 that that WDT 10_1101-2021_01_02_425006 118 41 appear appear VBP 10_1101-2021_01_02_425006 118 42 in in IN 10_1101-2021_01_02_425006 118 43 the the DT 10_1101-2021_01_02_425006 118 44 evaluation evaluation NN 10_1101-2021_01_02_425006 118 45 data datum NNS 10_1101-2021_01_02_425006 118 46 but but CC 10_1101-2021_01_02_425006 118 47 190 190 CD 10_1101-2021_01_02_425006 118 48 not not RB 10_1101-2021_01_02_425006 118 49 in in IN 10_1101-2021_01_02_425006 118 50 the the DT 10_1101-2021_01_02_425006 118 51 prediction prediction NN 10_1101-2021_01_02_425006 118 52 results result NNS 10_1101-2021_01_02_425006 118 53 of of IN 10_1101-2021_01_02_425006 118 54 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 118 55 . . . 10_1101-2021_01_02_425006 119 1 191 191 CD 10_1101-2021_01_02_425006 119 2 � � NNS 10_1101-2021_01_02_425006 119 3 � � , 10_1101-2021_01_02_425006 119 4 � � NNS 10_1101-2021_01_02_425006 119 5 � � VBZ 10_1101-2021_01_02_425006 119 6 � � VBZ 10_1101-2021_01_02_425006 119 7 � � VBZ 10_1101-2021_01_02_425006 119 8 � � JJ 10_1101-2021_01_02_425006 119 9 � � NNS 10_1101-2021_01_02_425006 119 10 � � NNP 10_1101-2021_01_02_425006 119 11 = = VBZ 10_1101-2021_01_02_425006 119 12 � � ADD 10_1101-2021_01_02_425006 119 13 � � VBZ 10_1101-2021_01_02_425006 119 14 � � NNS 10_1101-2021_01_02_425006 119 15 � � NN 10_1101-2021_01_02_425006 119 16 + + CC 10_1101-2021_01_02_425006 119 17 � � NNS 10_1101-2021_01_02_425006 119 18 � � NNS 10_1101-2021_01_02_425006 119 19 � � NNS 10_1101-2021_01_02_425006 119 20 � � VBZ 10_1101-2021_01_02_425006 119 21 � � VBZ 10_1101-2021_01_02_425006 119 22 � � JJ 10_1101-2021_01_02_425006 119 23 � � NNS 10_1101-2021_01_02_425006 119 24 � � NNP 10_1101-2021_01_02_425006 119 25 = = VBZ 10_1101-2021_01_02_425006 119 26 � � ADD 10_1101-2021_01_02_425006 119 27 � � VBZ 10_1101-2021_01_02_425006 119 28 � � NNS 10_1101-2021_01_02_425006 119 29 � � NN 10_1101-2021_01_02_425006 119 30 + + SYM 10_1101-2021_01_02_425006 119 31 � � NNS 10_1101-2021_01_02_425006 119 32 � � NNP 10_1101-2021_01_02_425006 119 33 ( ( -LRB- 10_1101-2021_01_02_425006 119 34 7 7 CD 10_1101-2021_01_02_425006 119 35 ) ) -RRB- 10_1101-2021_01_02_425006 119 36 In in IN 10_1101-2021_01_02_425006 119 37 the the DT 10_1101-2021_01_02_425006 119 38 second second JJ 10_1101-2021_01_02_425006 119 39 evaluation evaluation NN 10_1101-2021_01_02_425006 119 40 method method NN 10_1101-2021_01_02_425006 119 41 , , , 10_1101-2021_01_02_425006 119 42 precision precision NN 10_1101-2021_01_02_425006 119 43 and and CC 10_1101-2021_01_02_425006 119 44 recall recall NN 10_1101-2021_01_02_425006 119 45 were be VBD 10_1101-2021_01_02_425006 119 46 defined define VBN 10_1101-2021_01_02_425006 119 47 based base VBN 10_1101-2021_01_02_425006 119 48 on on IN 10_1101-2021_01_02_425006 119 49 relaxed relaxed JJ 10_1101-2021_01_02_425006 119 50 matching matching NN 10_1101-2021_01_02_425006 119 51 , , , 10_1101-2021_01_02_425006 119 52 which which WDT 10_1101-2021_01_02_425006 119 53 192 192 CD 10_1101-2021_01_02_425006 119 54 is be VBZ 10_1101-2021_01_02_425006 119 55 measured measure VBN 10_1101-2021_01_02_425006 119 56 by by IN 10_1101-2021_01_02_425006 119 57 the the DT 10_1101-2021_01_02_425006 119 58 similarity similarity NN 10_1101-2021_01_02_425006 119 59 of of IN 10_1101-2021_01_02_425006 119 60 two two CD 10_1101-2021_01_02_425006 119 61 ATUs atu NNS 10_1101-2021_01_02_425006 119 62 . . . 10_1101-2021_01_02_425006 120 1 Assuming assume VBG 10_1101-2021_01_02_425006 120 2 that that IN 10_1101-2021_01_02_425006 120 3 an an DT 10_1101-2021_01_02_425006 120 4 ATU ATU NNP 10_1101-2021_01_02_425006 120 5 � � NNP 10_1101-2021_01_02_425006 120 6 is be VBZ 10_1101-2021_01_02_425006 120 7 in in IN 10_1101-2021_01_02_425006 120 8 one one CD 10_1101-2021_01_02_425006 120 9 of of IN 10_1101-2021_01_02_425006 120 10 two two CD 10_1101-2021_01_02_425006 120 11 datasets dataset NNS 10_1101-2021_01_02_425006 120 12 ( ( -LRB- 10_1101-2021_01_02_425006 120 13 the the DT 10_1101-2021_01_02_425006 120 14 193 193 CD 10_1101-2021_01_02_425006 120 15 predicted predict VBN 10_1101-2021_01_02_425006 120 16 ATU ATU NNP 10_1101-2021_01_02_425006 120 17 dataset dataset NN 10_1101-2021_01_02_425006 120 18 and and CC 10_1101-2021_01_02_425006 120 19 evaluated evaluate VBD 10_1101-2021_01_02_425006 120 20 ATU ATU NNP 10_1101-2021_01_02_425006 120 21 dataset dataset NN 10_1101-2021_01_02_425006 120 22 ) ) -RRB- 10_1101-2021_01_02_425006 120 23 , , , 10_1101-2021_01_02_425006 120 24 the the DT 10_1101-2021_01_02_425006 120 25 definition definition NN 10_1101-2021_01_02_425006 120 26 and and CC 10_1101-2021_01_02_425006 120 27 calculation calculation NN 10_1101-2021_01_02_425006 120 28 of of IN 10_1101-2021_01_02_425006 120 29 the the DT 10_1101-2021_01_02_425006 120 30 similarity similarity NN 10_1101-2021_01_02_425006 120 31 of of IN 10_1101-2021_01_02_425006 120 32 � � NNP 10_1101-2021_01_02_425006 120 33 194 194 CD 10_1101-2021_01_02_425006 120 34 are be VBP 10_1101-2021_01_02_425006 120 35 shown show VBN 10_1101-2021_01_02_425006 120 36 in in IN 10_1101-2021_01_02_425006 120 37 the the DT 10_1101-2021_01_02_425006 120 38 following follow VBG 10_1101-2021_01_02_425006 120 39 three three CD 10_1101-2021_01_02_425006 120 40 cases case NNS 10_1101-2021_01_02_425006 120 41 : : : 10_1101-2021_01_02_425006 120 42 195 195 CD 10_1101-2021_01_02_425006 120 43 Case case NN 10_1101-2021_01_02_425006 120 44 1 1 CD 10_1101-2021_01_02_425006 120 45 : : : 10_1101-2021_01_02_425006 120 46 If if IN 10_1101-2021_01_02_425006 120 47 � � NNP 10_1101-2021_01_02_425006 120 48 shares share VBZ 10_1101-2021_01_02_425006 120 49 boundary boundary JJ 10_1101-2021_01_02_425006 120 50 genes gene NNS 10_1101-2021_01_02_425006 120 51 at at IN 10_1101-2021_01_02_425006 120 52 both both DT 10_1101-2021_01_02_425006 120 53 ends end NNS 10_1101-2021_01_02_425006 120 54 of of IN 10_1101-2021_01_02_425006 120 55 an an DT 10_1101-2021_01_02_425006 120 56 ATU ATU NNP 10_1101-2021_01_02_425006 120 57 in in IN 10_1101-2021_01_02_425006 120 58 the the DT 10_1101-2021_01_02_425006 120 59 other other JJ 10_1101-2021_01_02_425006 120 60 dataset dataset NN 10_1101-2021_01_02_425006 120 61 , , , 10_1101-2021_01_02_425006 120 62 i.e. i.e. FW 10_1101-2021_01_02_425006 120 63 , , , 10_1101-2021_01_02_425006 120 64 all all DT 10_1101-2021_01_02_425006 120 65 component component VBP 10_1101-2021_01_02_425006 120 66 196 196 CD 10_1101-2021_01_02_425006 120 67 genes gene NNS 10_1101-2021_01_02_425006 120 68 of of IN 10_1101-2021_01_02_425006 120 69 � � NNP 10_1101-2021_01_02_425006 120 70 are be VBP 10_1101-2021_01_02_425006 120 71 the the DT 10_1101-2021_01_02_425006 120 72 same same JJ 10_1101-2021_01_02_425006 120 73 as as IN 10_1101-2021_01_02_425006 120 74 one one CD 10_1101-2021_01_02_425006 120 75 in in IN 10_1101-2021_01_02_425006 120 76 the the DT 10_1101-2021_01_02_425006 120 77 other other JJ 10_1101-2021_01_02_425006 120 78 dataset dataset NN 10_1101-2021_01_02_425006 120 79 , , , 10_1101-2021_01_02_425006 120 80 then then RB 10_1101-2021_01_02_425006 120 81 � � ADD 10_1101-2021_01_02_425006 120 82 � � ADD 10_1101-2021_01_02_425006 120 83 � � VBZ 10_1101-2021_01_02_425006 120 84 � � VBZ 10_1101-2021_01_02_425006 120 85 � � VBZ 10_1101-2021_01_02_425006 120 86 � � VBZ 10_1101-2021_01_02_425006 120 87 � � VBZ 10_1101-2021_01_02_425006 120 88 � � JJ 10_1101-2021_01_02_425006 120 89 � � NNS 10_1101-2021_01_02_425006 120 90 � � NNP 10_1101-2021_01_02_425006 120 91 ( ( -LRB- 10_1101-2021_01_02_425006 120 92 � � NNP 10_1101-2021_01_02_425006 120 93 ) ) -RRB- 10_1101-2021_01_02_425006 120 94 = = NFP 10_1101-2021_01_02_425006 120 95 1 1 CD 10_1101-2021_01_02_425006 120 96 . . . 10_1101-2021_01_02_425006 121 1 197 197 CD 10_1101-2021_01_02_425006 121 2 Case case NN 10_1101-2021_01_02_425006 121 3 2 2 CD 10_1101-2021_01_02_425006 121 4 : : : 10_1101-2021_01_02_425006 121 5 If if IN 10_1101-2021_01_02_425006 121 6 � � NNP 10_1101-2021_01_02_425006 121 7 shares share NNS 10_1101-2021_01_02_425006 121 8 exactly exactly RB 10_1101-2021_01_02_425006 121 9 one one CD 10_1101-2021_01_02_425006 121 10 boundary boundary JJ 10_1101-2021_01_02_425006 121 11 gene gene NN 10_1101-2021_01_02_425006 121 12 of of IN 10_1101-2021_01_02_425006 121 13 ATUs ATUs NNPS 10_1101-2021_01_02_425006 121 14 in in IN 10_1101-2021_01_02_425006 121 15 the the DT 10_1101-2021_01_02_425006 121 16 other other JJ 10_1101-2021_01_02_425006 121 17 dataset dataset NN 10_1101-2021_01_02_425006 121 18 , , , 10_1101-2021_01_02_425006 121 19 then then RB 10_1101-2021_01_02_425006 121 20 we -PRON- PRP 10_1101-2021_01_02_425006 121 21 denote denote VBP 10_1101-2021_01_02_425006 121 22 � � NNP 10_1101-2021_01_02_425006 121 23 � � NNP 10_1101-2021_01_02_425006 121 24 as as IN 10_1101-2021_01_02_425006 121 25 198 198 CD 10_1101-2021_01_02_425006 121 26 the the DT 10_1101-2021_01_02_425006 121 27 ATUs atu NNS 10_1101-2021_01_02_425006 121 28 in in IN 10_1101-2021_01_02_425006 121 29 the the DT 10_1101-2021_01_02_425006 121 30 other other JJ 10_1101-2021_01_02_425006 121 31 dataset dataset NN 10_1101-2021_01_02_425006 121 32 that that WDT 10_1101-2021_01_02_425006 121 33 share share VBP 10_1101-2021_01_02_425006 121 34 the the DT 10_1101-2021_01_02_425006 121 35 5’-end 5’-end CD 10_1101-2021_01_02_425006 121 36 gene gene NN 10_1101-2021_01_02_425006 121 37 with with IN 10_1101-2021_01_02_425006 121 38 � � NNP 10_1101-2021_01_02_425006 121 39 and and CC 10_1101-2021_01_02_425006 121 40 denoted denote VBD 10_1101-2021_01_02_425006 121 41 � � NNP 10_1101-2021_01_02_425006 121 42 � � NNP 10_1101-2021_01_02_425006 121 43 as as IN 10_1101-2021_01_02_425006 121 44 the the DT 10_1101-2021_01_02_425006 121 45 ATUs atu NNS 10_1101-2021_01_02_425006 121 46 in in IN 10_1101-2021_01_02_425006 121 47 the the DT 10_1101-2021_01_02_425006 121 48 199 199 CD 10_1101-2021_01_02_425006 121 49 other other JJ 10_1101-2021_01_02_425006 121 50 dataset dataset NN 10_1101-2021_01_02_425006 121 51 that that WDT 10_1101-2021_01_02_425006 121 52 share share VBP 10_1101-2021_01_02_425006 121 53 the the DT 10_1101-2021_01_02_425006 121 54 3’-end 3’-end CD 10_1101-2021_01_02_425006 121 55 gene gene NN 10_1101-2021_01_02_425006 121 56 with with IN 10_1101-2021_01_02_425006 121 57 � � NNP 10_1101-2021_01_02_425006 121 58 , , , 10_1101-2021_01_02_425006 121 59 � � NNP 10_1101-2021_01_02_425006 121 60 � � NNP 10_1101-2021_01_02_425006 121 61 ∩ ∩ NN 10_1101-2021_01_02_425006 121 62 � � NNP 10_1101-2021_01_02_425006 121 63 � � NNP 10_1101-2021_01_02_425006 121 64 = = SYM 10_1101-2021_01_02_425006 121 65 ∅ ∅ NNP 10_1101-2021_01_02_425006 121 66 , , , 10_1101-2021_01_02_425006 121 67 one one CD 10_1101-2021_01_02_425006 121 68 of of IN 10_1101-2021_01_02_425006 121 69 � � NNP 10_1101-2021_01_02_425006 121 70 � � NNP 10_1101-2021_01_02_425006 121 71 and and CC 10_1101-2021_01_02_425006 121 72 � � NNP 10_1101-2021_01_02_425006 121 73 � � NNP 10_1101-2021_01_02_425006 121 74 can can MD 10_1101-2021_01_02_425006 121 75 be be VB 10_1101-2021_01_02_425006 121 76 empty empty JJ 10_1101-2021_01_02_425006 121 77 . . . 10_1101-2021_01_02_425006 122 1 Then then RB 10_1101-2021_01_02_425006 122 2 , , , 10_1101-2021_01_02_425006 122 3 200 200 CD 10_1101-2021_01_02_425006 122 4 � � NNS 10_1101-2021_01_02_425006 122 5 � � JJ 10_1101-2021_01_02_425006 122 6 � � NNS 10_1101-2021_01_02_425006 122 7 � � VBZ 10_1101-2021_01_02_425006 122 8 � � VBZ 10_1101-2021_01_02_425006 122 9 � � VBZ 10_1101-2021_01_02_425006 122 10 � � VBZ 10_1101-2021_01_02_425006 122 11 � � JJ 10_1101-2021_01_02_425006 122 12 � � NNS 10_1101-2021_01_02_425006 122 13 � � NNP 10_1101-2021_01_02_425006 122 14 ( ( -LRB- 10_1101-2021_01_02_425006 122 15 � � NNP 10_1101-2021_01_02_425006 122 16 ) ) -RRB- 10_1101-2021_01_02_425006 122 17 = = NFP 10_1101-2021_01_02_425006 122 18 1 1 CD 10_1101-2021_01_02_425006 122 19 2 2 CD 10_1101-2021_01_02_425006 122 20 � � NNS 10_1101-2021_01_02_425006 122 21 � � JJ 10_1101-2021_01_02_425006 122 22 � � '' 10_1101-2021_01_02_425006 122 23 � � NNS 10_1101-2021_01_02_425006 122 24 � � VBZ 10_1101-2021_01_02_425006 122 25 ∈ ∈ JJ 10_1101-2021_01_02_425006 122 26 � � NNS 10_1101-2021_01_02_425006 122 27 � � NNS 10_1101-2021_01_02_425006 122 28 � � , 10_1101-2021_01_02_425006 122 29 ( ( -LRB- 10_1101-2021_01_02_425006 122 30 � � NNP 10_1101-2021_01_02_425006 122 31 � � NNP 10_1101-2021_01_02_425006 122 32 ) ) -RRB- 10_1101-2021_01_02_425006 122 33 � � NNP 10_1101-2021_01_02_425006 122 34 ( ( -LRB- 10_1101-2021_01_02_425006 122 35 � � NNP 10_1101-2021_01_02_425006 122 36 � � NNP 10_1101-2021_01_02_425006 122 37 ) ) -RRB- 10_1101-2021_01_02_425006 122 38 + + CC 10_1101-2021_01_02_425006 122 39 1 1 CD 10_1101-2021_01_02_425006 122 40 2 2 CD 10_1101-2021_01_02_425006 122 41 � � NNS 10_1101-2021_01_02_425006 122 42 � � JJ 10_1101-2021_01_02_425006 122 43 � � '' 10_1101-2021_01_02_425006 122 44 � � NNS 10_1101-2021_01_02_425006 122 45 � � VBZ 10_1101-2021_01_02_425006 122 46 ∈ ∈ JJ 10_1101-2021_01_02_425006 122 47 � � NNS 10_1101-2021_01_02_425006 122 48 � � NNS 10_1101-2021_01_02_425006 122 49 � � , 10_1101-2021_01_02_425006 122 50 ( ( -LRB- 10_1101-2021_01_02_425006 122 51 � � NNP 10_1101-2021_01_02_425006 122 52 � � NNP 10_1101-2021_01_02_425006 122 53 ) ) -RRB- 10_1101-2021_01_02_425006 122 54 � � NNP 10_1101-2021_01_02_425006 122 55 ( ( -LRB- 10_1101-2021_01_02_425006 122 56 � � NNP 10_1101-2021_01_02_425006 122 57 � � NNP 10_1101-2021_01_02_425006 122 58 ) ) -RRB- 10_1101-2021_01_02_425006 122 59 ( ( -LRB- 10_1101-2021_01_02_425006 122 60 8) 8) CD 10_1101-2021_01_02_425006 122 61 where where WRB 10_1101-2021_01_02_425006 122 62 � � NNP 10_1101-2021_01_02_425006 122 63 ( ( -LRB- 10_1101-2021_01_02_425006 122 64 � � NNP 10_1101-2021_01_02_425006 122 65 � � NNP 10_1101-2021_01_02_425006 122 66 ) ) -RRB- 10_1101-2021_01_02_425006 122 67 is be VBZ 10_1101-2021_01_02_425006 122 68 the the DT 10_1101-2021_01_02_425006 122 69 number number NN 10_1101-2021_01_02_425006 122 70 of of IN 10_1101-2021_01_02_425006 122 71 shared share VBN 10_1101-2021_01_02_425006 122 72 genes gene NNS 10_1101-2021_01_02_425006 122 73 of of IN 10_1101-2021_01_02_425006 122 74 � � NNP 10_1101-2021_01_02_425006 122 75 and and CC 10_1101-2021_01_02_425006 122 76 � � NNP 10_1101-2021_01_02_425006 122 77 � � NNP 10_1101-2021_01_02_425006 122 78 and and CC 10_1101-2021_01_02_425006 122 79 � � NNP 10_1101-2021_01_02_425006 122 80 ( ( -LRB- 10_1101-2021_01_02_425006 122 81 � � NNP 10_1101-2021_01_02_425006 122 82 � � NNP 10_1101-2021_01_02_425006 122 83 ) ) -RRB- 10_1101-2021_01_02_425006 122 84 is be VBZ 10_1101-2021_01_02_425006 122 85 the the DT 10_1101-2021_01_02_425006 122 86 maximal maximal JJ 10_1101-2021_01_02_425006 122 87 size size NN 10_1101-2021_01_02_425006 122 88 of of IN 10_1101-2021_01_02_425006 122 89 � � NNP 10_1101-2021_01_02_425006 122 90 and and CC 10_1101-2021_01_02_425006 122 91 � � NNP 10_1101-2021_01_02_425006 122 92 � � NNP 10_1101-2021_01_02_425006 122 93 . . . 10_1101-2021_01_02_425006 123 1 201 201 CD 10_1101-2021_01_02_425006 123 2 Case case NN 10_1101-2021_01_02_425006 123 3 3 3 CD 10_1101-2021_01_02_425006 123 4 : : : 10_1101-2021_01_02_425006 123 5 If if IN 10_1101-2021_01_02_425006 123 6 � � NNP 10_1101-2021_01_02_425006 123 7 shares share NNS 10_1101-2021_01_02_425006 123 8 no no DT 10_1101-2021_01_02_425006 123 9 boundary boundary JJ 10_1101-2021_01_02_425006 123 10 genes gene NNS 10_1101-2021_01_02_425006 123 11 at at IN 10_1101-2021_01_02_425006 123 12 both both DT 10_1101-2021_01_02_425006 123 13 ends end NNS 10_1101-2021_01_02_425006 123 14 of of IN 10_1101-2021_01_02_425006 123 15 the the DT 10_1101-2021_01_02_425006 123 16 ATUs atu NNS 10_1101-2021_01_02_425006 123 17 in in IN 10_1101-2021_01_02_425006 123 18 the the DT 10_1101-2021_01_02_425006 123 19 other other JJ 10_1101-2021_01_02_425006 123 20 dataset dataset NN 10_1101-2021_01_02_425006 123 21 , , , 10_1101-2021_01_02_425006 123 22 then then RB 10_1101-2021_01_02_425006 123 23 202 202 CD 10_1101-2021_01_02_425006 123 24 � � NNS 10_1101-2021_01_02_425006 123 25 � � , 10_1101-2021_01_02_425006 123 26 � � , 10_1101-2021_01_02_425006 123 27 � � NNS 10_1101-2021_01_02_425006 123 28 � � VBZ 10_1101-2021_01_02_425006 123 29 � � VBZ 10_1101-2021_01_02_425006 123 30 � � VBZ 10_1101-2021_01_02_425006 123 31 � � JJ 10_1101-2021_01_02_425006 123 32 � � NNS 10_1101-2021_01_02_425006 123 33 � � NNP 10_1101-2021_01_02_425006 123 34 ( ( -LRB- 10_1101-2021_01_02_425006 123 35 � � NNP 10_1101-2021_01_02_425006 123 36 ) ) -RRB- 10_1101-2021_01_02_425006 123 37 = = NFP 10_1101-2021_01_02_425006 123 38 0 0 NFP 10_1101-2021_01_02_425006 123 39 . . . 10_1101-2021_01_02_425006 124 1 203 203 CD 10_1101-2021_01_02_425006 124 2 Finally finally RB 10_1101-2021_01_02_425006 124 3 , , , 10_1101-2021_01_02_425006 124 4 the the DT 10_1101-2021_01_02_425006 124 5 precision precision NN 10_1101-2021_01_02_425006 124 6 and and CC 10_1101-2021_01_02_425006 124 7 recall recall NN 10_1101-2021_01_02_425006 124 8 based base VBN 10_1101-2021_01_02_425006 124 9 on on IN 10_1101-2021_01_02_425006 124 10 relaxed relaxed JJ 10_1101-2021_01_02_425006 124 11 matching matching NN 10_1101-2021_01_02_425006 124 12 are be VBP 10_1101-2021_01_02_425006 124 13 calculated calculate VBN 10_1101-2021_01_02_425006 124 14 by by IN 10_1101-2021_01_02_425006 124 15 the the DT 10_1101-2021_01_02_425006 124 16 following follow VBG 10_1101-2021_01_02_425006 124 17 formula formula NN 10_1101-2021_01_02_425006 124 18 : : : 10_1101-2021_01_02_425006 124 19 204 204 CD 10_1101-2021_01_02_425006 124 20 � � NNS 10_1101-2021_01_02_425006 124 21 � � NNS 10_1101-2021_01_02_425006 124 22 � � JJ 10_1101-2021_01_02_425006 124 23 � � NNS 10_1101-2021_01_02_425006 124 24 � � VBZ 10_1101-2021_01_02_425006 124 25 � � VBZ 10_1101-2021_01_02_425006 124 26 � � JJ 10_1101-2021_01_02_425006 124 27 � � NNS 10_1101-2021_01_02_425006 124 28 � � NNP 10_1101-2021_01_02_425006 124 29 = = SYM 10_1101-2021_01_02_425006 124 30 ∑ ∑ . 10_1101-2021_01_02_425006 124 31 � � JJ 10_1101-2021_01_02_425006 124 32 � � ADD 10_1101-2021_01_02_425006 124 33 � � NNS 10_1101-2021_01_02_425006 124 34 � � VBZ 10_1101-2021_01_02_425006 124 35 � � VBZ 10_1101-2021_01_02_425006 124 36 � � VBZ 10_1101-2021_01_02_425006 124 37 � � VBZ 10_1101-2021_01_02_425006 124 38 � � JJ 10_1101-2021_01_02_425006 124 39 � � NNS 10_1101-2021_01_02_425006 124 40 � � NNP 10_1101-2021_01_02_425006 124 41 ( ( -LRB- 10_1101-2021_01_02_425006 124 42 � � NNP 10_1101-2021_01_02_425006 124 43 ) ) -RRB- 10_1101-2021_01_02_425006 124 44 � � VBZ 10_1101-2021_01_02_425006 124 45 ∈ ∈ JJ 10_1101-2021_01_02_425006 124 46 � � NNS 10_1101-2021_01_02_425006 124 47 � � NNS 10_1101-2021_01_02_425006 124 48 � � JJ 10_1101-2021_01_02_425006 124 49 � � NNS 10_1101-2021_01_02_425006 124 50 .CC .CC NFP 10_1101-2021_01_02_425006 124 51 - - : 10_1101-2021_01_02_425006 124 52 BY by IN 10_1101-2021_01_02_425006 124 53 - - HYPH 10_1101-2021_01_02_425006 124 54 NC NC NNP 10_1101-2021_01_02_425006 124 55 - - HYPH 10_1101-2021_01_02_425006 124 56 ND ND NNP 10_1101-2021_01_02_425006 124 57 4.0 4.0 CD 10_1101-2021_01_02_425006 124 58 International International NNP 10_1101-2021_01_02_425006 124 59 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 124 60 under under IN 10_1101-2021_01_02_425006 124 61 a a DT 10_1101-2021_01_02_425006 124 62 ( ( -LRB- 10_1101-2021_01_02_425006 124 63 which which WDT 10_1101-2021_01_02_425006 124 64 was be VBD 10_1101-2021_01_02_425006 124 65 not not RB 10_1101-2021_01_02_425006 124 66 certified certify VBN 10_1101-2021_01_02_425006 124 67 by by IN 10_1101-2021_01_02_425006 124 68 peer peer NN 10_1101-2021_01_02_425006 124 69 review review NN 10_1101-2021_01_02_425006 124 70 ) ) -RRB- 10_1101-2021_01_02_425006 124 71 is be VBZ 10_1101-2021_01_02_425006 124 72 the the DT 10_1101-2021_01_02_425006 124 73 author author NN 10_1101-2021_01_02_425006 124 74 / / SYM 10_1101-2021_01_02_425006 124 75 funder funder NN 10_1101-2021_01_02_425006 124 76 , , , 10_1101-2021_01_02_425006 124 77 who who WP 10_1101-2021_01_02_425006 124 78 has have VBZ 10_1101-2021_01_02_425006 124 79 granted grant VBN 10_1101-2021_01_02_425006 124 80 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 124 81 a a DT 10_1101-2021_01_02_425006 124 82 license license NN 10_1101-2021_01_02_425006 124 83 to to TO 10_1101-2021_01_02_425006 124 84 display display VB 10_1101-2021_01_02_425006 124 85 the the DT 10_1101-2021_01_02_425006 124 86 preprint preprint NN 10_1101-2021_01_02_425006 124 87 in in IN 10_1101-2021_01_02_425006 124 88 perpetuity perpetuity NN 10_1101-2021_01_02_425006 124 89 . . . 10_1101-2021_01_02_425006 125 1 It -PRON- PRP 10_1101-2021_01_02_425006 125 2 is be VBZ 10_1101-2021_01_02_425006 125 3 made make VBN 10_1101-2021_01_02_425006 125 4 The the DT 10_1101-2021_01_02_425006 125 5 copyright copyright NN 10_1101-2021_01_02_425006 125 6 holder holder NN 10_1101-2021_01_02_425006 125 7 for for IN 10_1101-2021_01_02_425006 125 8 this this DT 10_1101-2021_01_02_425006 125 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 125 10 version version NN 10_1101-2021_01_02_425006 125 11 posted post VBD 10_1101-2021_01_02_425006 125 12 January January NNP 10_1101-2021_01_02_425006 125 13 6 6 CD 10_1101-2021_01_02_425006 125 14 , , , 10_1101-2021_01_02_425006 125 15 2021 2021 CD 10_1101-2021_01_02_425006 125 16 . . . 10_1101-2021_01_02_425006 125 17 ; ; : 10_1101-2021_01_02_425006 125 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 125 19 : : : 10_1101-2021_01_02_425006 125 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 125 21 preprint preprint NN 10_1101-2021_01_02_425006 125 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 125 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 125 24 13 13 CD 10_1101-2021_01_02_425006 125 25 � � NNS 10_1101-2021_01_02_425006 125 26 � � , 10_1101-2021_01_02_425006 125 27 � � NNS 10_1101-2021_01_02_425006 125 28 � � JJ 10_1101-2021_01_02_425006 125 29 � � NNS 10_1101-2021_01_02_425006 125 30 � � NNP 10_1101-2021_01_02_425006 125 31 = = SYM 10_1101-2021_01_02_425006 125 32 ∑ ∑ . 10_1101-2021_01_02_425006 125 33 � � JJ 10_1101-2021_01_02_425006 125 34 � � ADD 10_1101-2021_01_02_425006 125 35 � � NNS 10_1101-2021_01_02_425006 125 36 � � VBZ 10_1101-2021_01_02_425006 125 37 � � VBZ 10_1101-2021_01_02_425006 125 38 � � VBZ 10_1101-2021_01_02_425006 125 39 � � VBZ 10_1101-2021_01_02_425006 125 40 � � JJ 10_1101-2021_01_02_425006 125 41 � � NNS 10_1101-2021_01_02_425006 125 42 � � NNP 10_1101-2021_01_02_425006 125 43 ( ( -LRB- 10_1101-2021_01_02_425006 125 44 � � NNP 10_1101-2021_01_02_425006 125 45 ) ) -RRB- 10_1101-2021_01_02_425006 125 46 � � VBZ 10_1101-2021_01_02_425006 125 47 ∈ ∈ JJ 10_1101-2021_01_02_425006 125 48 � � NNS 10_1101-2021_01_02_425006 125 49 � � NNS 10_1101-2021_01_02_425006 125 50 � � NNS 10_1101-2021_01_02_425006 125 51 � � NNP 10_1101-2021_01_02_425006 125 52 ( ( -LRB- 10_1101-2021_01_02_425006 125 53 9 9 CD 10_1101-2021_01_02_425006 125 54 ) ) -RRB- 10_1101-2021_01_02_425006 125 55 where where WRB 10_1101-2021_01_02_425006 125 56 � � NNP 10_1101-2021_01_02_425006 125 57 � � NNP 10_1101-2021_01_02_425006 125 58 is be VBZ 10_1101-2021_01_02_425006 125 59 the the DT 10_1101-2021_01_02_425006 125 60 set set NN 10_1101-2021_01_02_425006 125 61 of of IN 10_1101-2021_01_02_425006 125 62 predicted predict VBN 10_1101-2021_01_02_425006 125 63 ATUs atu NNS 10_1101-2021_01_02_425006 125 64 , , , 10_1101-2021_01_02_425006 125 65 � � NNP 10_1101-2021_01_02_425006 125 66 � � NNP 10_1101-2021_01_02_425006 125 67 is be VBZ 10_1101-2021_01_02_425006 125 68 the the DT 10_1101-2021_01_02_425006 125 69 number number NN 10_1101-2021_01_02_425006 125 70 of of IN 10_1101-2021_01_02_425006 125 71 predicted predict VBN 10_1101-2021_01_02_425006 125 72 ATUs atu NNS 10_1101-2021_01_02_425006 125 73 , , , 10_1101-2021_01_02_425006 125 74 � � NNP 10_1101-2021_01_02_425006 125 75 � � NNP 10_1101-2021_01_02_425006 125 76 is be VBZ 10_1101-2021_01_02_425006 125 77 the the DT 10_1101-2021_01_02_425006 125 78 set set NN 10_1101-2021_01_02_425006 125 79 of of IN 10_1101-2021_01_02_425006 125 80 evaluated evaluate VBN 10_1101-2021_01_02_425006 125 81 205 205 CD 10_1101-2021_01_02_425006 125 82 ATUs atu NNS 10_1101-2021_01_02_425006 125 83 , , , 10_1101-2021_01_02_425006 125 84 and and CC 10_1101-2021_01_02_425006 125 85 � � NNP 10_1101-2021_01_02_425006 125 86 � � NNP 10_1101-2021_01_02_425006 125 87 is be VBZ 10_1101-2021_01_02_425006 125 88 the the DT 10_1101-2021_01_02_425006 125 89 number number NN 10_1101-2021_01_02_425006 125 90 of of IN 10_1101-2021_01_02_425006 125 91 evaluated evaluated JJ 10_1101-2021_01_02_425006 125 92 ATUs atu NNS 10_1101-2021_01_02_425006 125 93 . . . 10_1101-2021_01_02_425006 126 1 206 206 CD 10_1101-2021_01_02_425006 126 2 RESULTS results NN 10_1101-2021_01_02_425006 126 3 207 207 CD 10_1101-2021_01_02_425006 126 4 A a DT 10_1101-2021_01_02_425006 126 5 reliable reliable JJ 10_1101-2021_01_02_425006 126 6 bias bias NN 10_1101-2021_01_02_425006 126 7 rate rate NN 10_1101-2021_01_02_425006 126 8 function function NN 10_1101-2021_01_02_425006 126 9 is be VBZ 10_1101-2021_01_02_425006 126 10 acquired acquire VBN 10_1101-2021_01_02_425006 126 11 in in IN 10_1101-2021_01_02_425006 126 12 modeling model VBG 10_1101-2021_01_02_425006 126 13 non non JJ 10_1101-2021_01_02_425006 126 14 - - JJ 10_1101-2021_01_02_425006 126 15 uniform uniform JJ 10_1101-2021_01_02_425006 126 16 read read VB 10_1101-2021_01_02_425006 126 17 distribution distribution NN 10_1101-2021_01_02_425006 126 18 along along IN 10_1101-2021_01_02_425006 126 19 mRNA mrna IN 10_1101-2021_01_02_425006 126 20 208 208 CD 10_1101-2021_01_02_425006 126 21 transcripts transcript NNS 10_1101-2021_01_02_425006 126 22 209 209 CD 10_1101-2021_01_02_425006 126 23 To to TO 10_1101-2021_01_02_425006 126 24 ensure ensure VB 10_1101-2021_01_02_425006 126 25 the the DT 10_1101-2021_01_02_425006 126 26 reliability reliability NN 10_1101-2021_01_02_425006 126 27 of of IN 10_1101-2021_01_02_425006 126 28 the the DT 10_1101-2021_01_02_425006 126 29 bias bias NN 10_1101-2021_01_02_425006 126 30 rate rate NN 10_1101-2021_01_02_425006 126 31 function function NN 10_1101-2021_01_02_425006 126 32 in in IN 10_1101-2021_01_02_425006 126 33 modeling model VBG 10_1101-2021_01_02_425006 126 34 non non JJ 10_1101-2021_01_02_425006 126 35 - - JJ 10_1101-2021_01_02_425006 126 36 uniform uniform JJ 10_1101-2021_01_02_425006 126 37 read read VB 10_1101-2021_01_02_425006 126 38 distribution distribution NN 10_1101-2021_01_02_425006 126 39 , , , 10_1101-2021_01_02_425006 126 40 we -PRON- PRP 10_1101-2021_01_02_425006 126 41 selected select VBD 10_1101-2021_01_02_425006 126 42 210 210 CD 10_1101-2021_01_02_425006 126 43 four four CD 10_1101-2021_01_02_425006 126 44 single single JJ 10_1101-2021_01_02_425006 126 45 gene gene NN 10_1101-2021_01_02_425006 126 46 mRNA mrna NN 10_1101-2021_01_02_425006 126 47 transcript transcript NNP 10_1101-2021_01_02_425006 126 48 datasets dataset VBZ 10_1101-2021_01_02_425006 126 49 randomly randomly RB 10_1101-2021_01_02_425006 126 50 from from IN 10_1101-2021_01_02_425006 126 51 the the DT 10_1101-2021_01_02_425006 126 52 two two CD 10_1101-2021_01_02_425006 126 53 evaluation evaluation NN 10_1101-2021_01_02_425006 126 54 datasets dataset VBZ 10_1101-2021_01_02_425006 126 55 211 211 CD 10_1101-2021_01_02_425006 126 56 ( ( -LRB- 10_1101-2021_01_02_425006 126 57 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 126 58 and and CC 10_1101-2021_01_02_425006 126 59 SMRT_RiEnrich SMRT_RiEnrich NNP 10_1101-2021_01_02_425006 126 60 ) ) -RRB- 10_1101-2021_01_02_425006 126 61 , , , 10_1101-2021_01_02_425006 126 62 named name VBN 10_1101-2021_01_02_425006 126 63 M9Enrich_1 M9Enrich_1 NNP 10_1101-2021_01_02_425006 126 64 , , , 10_1101-2021_01_02_425006 126 65 M9Enrich_2 M9Enrich_2 NNP 10_1101-2021_01_02_425006 126 66 , , , 10_1101-2021_01_02_425006 126 67 RiEnrich_1 RiEnrich_1 NNP 10_1101-2021_01_02_425006 126 68 , , , 10_1101-2021_01_02_425006 126 69 and and CC 10_1101-2021_01_02_425006 126 70 212 212 CD 10_1101-2021_01_02_425006 126 71 RiEnrich_2 RiEnrich_2 NNS 10_1101-2021_01_02_425006 126 72 . . . 10_1101-2021_01_02_425006 127 1 Four four CD 10_1101-2021_01_02_425006 127 2 bias bias NN 10_1101-2021_01_02_425006 127 3 rate rate NN 10_1101-2021_01_02_425006 127 4 functions function NNS 10_1101-2021_01_02_425006 127 5 , , , 10_1101-2021_01_02_425006 127 6 which which WDT 10_1101-2021_01_02_425006 127 7 are be VBP 10_1101-2021_01_02_425006 127 8 exponential exponential JJ 10_1101-2021_01_02_425006 127 9 functions function NNS 10_1101-2021_01_02_425006 127 10 , , , 10_1101-2021_01_02_425006 127 11 were be VBD 10_1101-2021_01_02_425006 127 12 generated generate VBN 10_1101-2021_01_02_425006 127 13 after after IN 10_1101-2021_01_02_425006 127 14 conducting conduct VBG 10_1101-2021_01_02_425006 127 15 213 213 CD 10_1101-2021_01_02_425006 127 16 nonlinear nonlinear JJ 10_1101-2021_01_02_425006 127 17 regression regression NN 10_1101-2021_01_02_425006 127 18 on on IN 10_1101-2021_01_02_425006 127 19 the the DT 10_1101-2021_01_02_425006 127 20 mRNA mrna NN 10_1101-2021_01_02_425006 127 21 transcripts transcript NNS 10_1101-2021_01_02_425006 127 22 across across IN 10_1101-2021_01_02_425006 127 23 these these DT 10_1101-2021_01_02_425006 127 24 four four CD 10_1101-2021_01_02_425006 127 25 datasets dataset NNS 10_1101-2021_01_02_425006 127 26 ( ( -LRB- 10_1101-2021_01_02_425006 127 27 Fig fig NN 10_1101-2021_01_02_425006 127 28 . . . 10_1101-2021_01_02_425006 128 1 2 2 LS 10_1101-2021_01_02_425006 128 2 ) ) -RRB- 10_1101-2021_01_02_425006 128 3 . . . 10_1101-2021_01_02_425006 129 1 We -PRON- PRP 10_1101-2021_01_02_425006 129 2 found find VBD 10_1101-2021_01_02_425006 129 3 that that IN 10_1101-2021_01_02_425006 129 4 these these DT 10_1101-2021_01_02_425006 129 5 214 214 CD 10_1101-2021_01_02_425006 129 6 bias bias NN 10_1101-2021_01_02_425006 129 7 rate rate NN 10_1101-2021_01_02_425006 129 8 functions function NNS 10_1101-2021_01_02_425006 129 9 were be VBD 10_1101-2021_01_02_425006 129 10 similar similar JJ 10_1101-2021_01_02_425006 129 11 ( ( -LRB- 10_1101-2021_01_02_425006 129 12 � � NNP 10_1101-2021_01_02_425006 129 13 � � NNP 10_1101-2021_01_02_425006 129 14 > > XX 10_1101-2021_01_02_425006 129 15 0.998 0.998 CD 10_1101-2021_01_02_425006 129 16 ) ) -RRB- 10_1101-2021_01_02_425006 129 17 when when WRB 10_1101-2021_01_02_425006 129 18 we -PRON- PRP 10_1101-2021_01_02_425006 129 19 evaluated evaluate VBD 10_1101-2021_01_02_425006 129 20 the the DT 10_1101-2021_01_02_425006 129 21 R r JJ 10_1101-2021_01_02_425006 129 22 - - HYPH 10_1101-2021_01_02_425006 129 23 square square NN 10_1101-2021_01_02_425006 129 24 statistic statistic NN 10_1101-2021_01_02_425006 129 25 ( ( -LRB- 10_1101-2021_01_02_425006 129 26 for for IN 10_1101-2021_01_02_425006 129 27 more more JJR 10_1101-2021_01_02_425006 129 28 215 215 CD 10_1101-2021_01_02_425006 129 29 details detail NNS 10_1101-2021_01_02_425006 129 30 , , , 10_1101-2021_01_02_425006 129 31 see see VB 10_1101-2021_01_02_425006 129 32 method method NN 10_1101-2021_01_02_425006 129 33 S6 s6 NN 10_1101-2021_01_02_425006 129 34 and and CC 10_1101-2021_01_02_425006 129 35 table table NN 10_1101-2021_01_02_425006 129 36 S2 S2 NNP 10_1101-2021_01_02_425006 129 37 ) ) -RRB- 10_1101-2021_01_02_425006 129 38 . . . 10_1101-2021_01_02_425006 130 1 The the DT 10_1101-2021_01_02_425006 130 2 similarity similarity NN 10_1101-2021_01_02_425006 130 3 of of IN 10_1101-2021_01_02_425006 130 4 the the DT 10_1101-2021_01_02_425006 130 5 four four CD 10_1101-2021_01_02_425006 130 6 bias bias NN 10_1101-2021_01_02_425006 130 7 rate rate NN 10_1101-2021_01_02_425006 130 8 functions function NNS 10_1101-2021_01_02_425006 130 9 indicated indicate VBD 10_1101-2021_01_02_425006 130 10 that that IN 10_1101-2021_01_02_425006 130 11 the the DT 10_1101-2021_01_02_425006 130 12 216 216 CD 10_1101-2021_01_02_425006 130 13 selection selection NN 10_1101-2021_01_02_425006 130 14 of of IN 10_1101-2021_01_02_425006 130 15 the the DT 10_1101-2021_01_02_425006 130 16 single single JJ 10_1101-2021_01_02_425006 130 17 gene gene NN 10_1101-2021_01_02_425006 130 18 mRNA mrna NN 10_1101-2021_01_02_425006 130 19 transcript transcript NNP 10_1101-2021_01_02_425006 130 20 datasets datasets NNPS 10_1101-2021_01_02_425006 130 21 had have VBD 10_1101-2021_01_02_425006 130 22 little little JJ 10_1101-2021_01_02_425006 130 23 impact impact NN 10_1101-2021_01_02_425006 130 24 on on IN 10_1101-2021_01_02_425006 130 25 modeling model VBG 10_1101-2021_01_02_425006 130 26 non non JJ 10_1101-2021_01_02_425006 130 27 - - JJ 10_1101-2021_01_02_425006 130 28 uniform uniform JJ 10_1101-2021_01_02_425006 130 29 read read VBD 10_1101-2021_01_02_425006 130 30 217 217 CD 10_1101-2021_01_02_425006 130 31 distribution distribution NN 10_1101-2021_01_02_425006 130 32 along along IN 10_1101-2021_01_02_425006 130 33 mRNA mRNA NNS 10_1101-2021_01_02_425006 130 34 transcripts transcript NNS 10_1101-2021_01_02_425006 130 35 , , , 10_1101-2021_01_02_425006 130 36 implying imply VBG 10_1101-2021_01_02_425006 130 37 the the DT 10_1101-2021_01_02_425006 130 38 universal universal JJ 10_1101-2021_01_02_425006 130 39 common common JJ 10_1101-2021_01_02_425006 130 40 non non JJ 10_1101-2021_01_02_425006 130 41 - - JJ 10_1101-2021_01_02_425006 130 42 uniform uniform JJ 10_1101-2021_01_02_425006 130 43 read read VBD 10_1101-2021_01_02_425006 130 44 distribution distribution NN 10_1101-2021_01_02_425006 130 45 of of IN 10_1101-2021_01_02_425006 130 46 218 218 CD 10_1101-2021_01_02_425006 130 47 different different JJ 10_1101-2021_01_02_425006 130 48 mRNA mRNA NNS 10_1101-2021_01_02_425006 130 49 transcripts transcript NNS 10_1101-2021_01_02_425006 130 50 of of IN 10_1101-2021_01_02_425006 130 51 E. E. NNP 10_1101-2021_01_02_425006 130 52 coli coli NNS 10_1101-2021_01_02_425006 130 53 . . . 10_1101-2021_01_02_425006 131 1 Specifically specifically RB 10_1101-2021_01_02_425006 131 2 , , , 10_1101-2021_01_02_425006 131 3 we -PRON- PRP 10_1101-2021_01_02_425006 131 4 used use VBD 10_1101-2021_01_02_425006 131 5 the the DT 10_1101-2021_01_02_425006 131 6 average average NN 10_1101-2021_01_02_425006 131 7 of of IN 10_1101-2021_01_02_425006 131 8 these these DT 10_1101-2021_01_02_425006 131 9 four four CD 10_1101-2021_01_02_425006 131 10 coefficients coefficient NNS 10_1101-2021_01_02_425006 131 11 as as IN 10_1101-2021_01_02_425006 131 12 the the DT 10_1101-2021_01_02_425006 131 13 219 219 CD 10_1101-2021_01_02_425006 131 14 final final JJ 10_1101-2021_01_02_425006 131 15 coefficients coefficient NNS 10_1101-2021_01_02_425006 131 16 of of IN 10_1101-2021_01_02_425006 131 17 the the DT 10_1101-2021_01_02_425006 131 18 exponential exponential JJ 10_1101-2021_01_02_425006 131 19 function function NN 10_1101-2021_01_02_425006 131 20 , , , 10_1101-2021_01_02_425006 131 21 which which WDT 10_1101-2021_01_02_425006 131 22 was be VBD 10_1101-2021_01_02_425006 131 23 � � NNP 10_1101-2021_01_02_425006 131 24 ( ( -LRB- 10_1101-2021_01_02_425006 131 25 � � NNP 10_1101-2021_01_02_425006 131 26 ) ) -RRB- 10_1101-2021_01_02_425006 131 27 = = NFP 10_1101-2021_01_02_425006 131 28 � � ADD 10_1101-2021_01_02_425006 131 29 � � , 10_1101-2021_01_02_425006 131 30 � � NNS 10_1101-2021_01_02_425006 131 31 � � NN 10_1101-2021_01_02_425006 131 32 with with IN 10_1101-2021_01_02_425006 131 33 � � NNP 10_1101-2021_01_02_425006 131 34 = = SYM 10_1101-2021_01_02_425006 131 35 0.256 0.256 CD 10_1101-2021_01_02_425006 131 36 and and CC 10_1101-2021_01_02_425006 131 37 � � NNP 10_1101-2021_01_02_425006 131 38 = = SYM 10_1101-2021_01_02_425006 131 39 220 220 CD 10_1101-2021_01_02_425006 131 40 0.00128 0.00128 CD 10_1101-2021_01_02_425006 131 41 . . . 10_1101-2021_01_02_425006 132 1 221 221 CD 10_1101-2021_01_02_425006 132 2 .CC .CC : 10_1101-2021_01_02_425006 132 3 - - HYPH 10_1101-2021_01_02_425006 132 4 BY by IN 10_1101-2021_01_02_425006 132 5 - - HYPH 10_1101-2021_01_02_425006 132 6 NC NC NNP 10_1101-2021_01_02_425006 132 7 - - HYPH 10_1101-2021_01_02_425006 132 8 ND ND NNP 10_1101-2021_01_02_425006 132 9 4.0 4.0 CD 10_1101-2021_01_02_425006 132 10 International International NNP 10_1101-2021_01_02_425006 132 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 132 12 under under IN 10_1101-2021_01_02_425006 132 13 a a DT 10_1101-2021_01_02_425006 132 14 ( ( -LRB- 10_1101-2021_01_02_425006 132 15 which which WDT 10_1101-2021_01_02_425006 132 16 was be VBD 10_1101-2021_01_02_425006 132 17 not not RB 10_1101-2021_01_02_425006 132 18 certified certify VBN 10_1101-2021_01_02_425006 132 19 by by IN 10_1101-2021_01_02_425006 132 20 peer peer NN 10_1101-2021_01_02_425006 132 21 review review NN 10_1101-2021_01_02_425006 132 22 ) ) -RRB- 10_1101-2021_01_02_425006 132 23 is be VBZ 10_1101-2021_01_02_425006 132 24 the the DT 10_1101-2021_01_02_425006 132 25 author author NN 10_1101-2021_01_02_425006 132 26 / / SYM 10_1101-2021_01_02_425006 132 27 funder funder NN 10_1101-2021_01_02_425006 132 28 , , , 10_1101-2021_01_02_425006 132 29 who who WP 10_1101-2021_01_02_425006 132 30 has have VBZ 10_1101-2021_01_02_425006 132 31 granted grant VBN 10_1101-2021_01_02_425006 132 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 132 33 a a DT 10_1101-2021_01_02_425006 132 34 license license NN 10_1101-2021_01_02_425006 132 35 to to TO 10_1101-2021_01_02_425006 132 36 display display VB 10_1101-2021_01_02_425006 132 37 the the DT 10_1101-2021_01_02_425006 132 38 preprint preprint NN 10_1101-2021_01_02_425006 132 39 in in IN 10_1101-2021_01_02_425006 132 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 132 41 . . . 10_1101-2021_01_02_425006 133 1 It -PRON- PRP 10_1101-2021_01_02_425006 133 2 is be VBZ 10_1101-2021_01_02_425006 133 3 made make VBN 10_1101-2021_01_02_425006 133 4 The the DT 10_1101-2021_01_02_425006 133 5 copyright copyright NN 10_1101-2021_01_02_425006 133 6 holder holder NN 10_1101-2021_01_02_425006 133 7 for for IN 10_1101-2021_01_02_425006 133 8 this this DT 10_1101-2021_01_02_425006 133 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 133 10 version version NN 10_1101-2021_01_02_425006 133 11 posted post VBD 10_1101-2021_01_02_425006 133 12 January January NNP 10_1101-2021_01_02_425006 133 13 6 6 CD 10_1101-2021_01_02_425006 133 14 , , , 10_1101-2021_01_02_425006 133 15 2021 2021 CD 10_1101-2021_01_02_425006 133 16 . . . 10_1101-2021_01_02_425006 133 17 ; ; : 10_1101-2021_01_02_425006 133 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 133 19 : : : 10_1101-2021_01_02_425006 133 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 133 21 preprint preprint NN 10_1101-2021_01_02_425006 133 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 133 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 133 24 14 14 CD 10_1101-2021_01_02_425006 133 25 Please please UH 10_1101-2021_01_02_425006 133 26 place place VB 10_1101-2021_01_02_425006 133 27 Fig Fig NNP 10_1101-2021_01_02_425006 133 28 . . . 10_1101-2021_01_02_425006 134 1 2 2 CD 10_1101-2021_01_02_425006 134 2 here here RB 10_1101-2021_01_02_425006 134 3 . . . 10_1101-2021_01_02_425006 135 1 222 222 CD 10_1101-2021_01_02_425006 135 2 ATUs atu NNS 10_1101-2021_01_02_425006 135 3 predicted predict VBN 10_1101-2021_01_02_425006 135 4 by by IN 10_1101-2021_01_02_425006 135 5 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 135 6 reach reach VB 10_1101-2021_01_02_425006 135 7 precision precision NN 10_1101-2021_01_02_425006 135 8 and and CC 10_1101-2021_01_02_425006 135 9 recall recall NN 10_1101-2021_01_02_425006 135 10 over over IN 10_1101-2021_01_02_425006 135 11 0.64 0.64 CD 10_1101-2021_01_02_425006 135 12 223 223 CD 10_1101-2021_01_02_425006 135 13 The the DT 10_1101-2021_01_02_425006 135 14 performance performance NN 10_1101-2021_01_02_425006 135 15 evaluation evaluation NN 10_1101-2021_01_02_425006 135 16 was be VBD 10_1101-2021_01_02_425006 135 17 conducted conduct VBN 10_1101-2021_01_02_425006 135 18 by by IN 10_1101-2021_01_02_425006 135 19 comparing compare VBG 10_1101-2021_01_02_425006 135 20 the the DT 10_1101-2021_01_02_425006 135 21 predicted predict VBN 10_1101-2021_01_02_425006 135 22 ATUs atu NNS 10_1101-2021_01_02_425006 135 23 with with IN 10_1101-2021_01_02_425006 135 24 the the DT 10_1101-2021_01_02_425006 135 25 ATUs atu NNS 10_1101-2021_01_02_425006 135 26 in in IN 10_1101-2021_01_02_425006 135 27 224 224 CD 10_1101-2021_01_02_425006 135 28 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 135 29 and and CC 10_1101-2021_01_02_425006 135 30 SMRT_RiEnrich SMRT_RiEnrich NNP 10_1101-2021_01_02_425006 135 31 , , , 10_1101-2021_01_02_425006 135 32 which which WDT 10_1101-2021_01_02_425006 135 33 were be VBD 10_1101-2021_01_02_425006 135 34 generated generate VBN 10_1101-2021_01_02_425006 135 35 based base VBN 10_1101-2021_01_02_425006 135 36 on on IN 10_1101-2021_01_02_425006 135 37 the the DT 10_1101-2021_01_02_425006 135 38 third third JJ 10_1101-2021_01_02_425006 135 39 - - HYPH 10_1101-2021_01_02_425006 135 40 generation generation NN 10_1101-2021_01_02_425006 135 41 sequencing sequencing NN 10_1101-2021_01_02_425006 135 42 225 225 CD 10_1101-2021_01_02_425006 135 43 and and CC 10_1101-2021_01_02_425006 135 44 are be VBP 10_1101-2021_01_02_425006 135 45 not not RB 10_1101-2021_01_02_425006 135 46 sensitive sensitive JJ 10_1101-2021_01_02_425006 135 47 to to IN 10_1101-2021_01_02_425006 135 48 transcripts transcript NNS 10_1101-2021_01_02_425006 135 49 with with IN 10_1101-2021_01_02_425006 135 50 low low JJ 10_1101-2021_01_02_425006 135 51 expression expression NN 10_1101-2021_01_02_425006 135 52 levels level NNS 10_1101-2021_01_02_425006 135 53 . . . 10_1101-2021_01_02_425006 136 1 For for IN 10_1101-2021_01_02_425006 136 2 a a DT 10_1101-2021_01_02_425006 136 3 more more RBR 10_1101-2021_01_02_425006 136 4 accurate accurate JJ 10_1101-2021_01_02_425006 136 5 and and CC 10_1101-2021_01_02_425006 136 6 fair fair JJ 10_1101-2021_01_02_425006 136 7 evaluation evaluation NN 10_1101-2021_01_02_425006 136 8 , , , 10_1101-2021_01_02_425006 136 9 226 226 CD 10_1101-2021_01_02_425006 136 10 maximal maximal JJ 10_1101-2021_01_02_425006 136 11 ATU ATU NNP 10_1101-2021_01_02_425006 136 12 clusters cluster NNS 10_1101-2021_01_02_425006 136 13 after after IN 10_1101-2021_01_02_425006 136 14 pre pre JJ 10_1101-2021_01_02_425006 136 15 - - NN 10_1101-2021_01_02_425006 136 16 selection selection NN 10_1101-2021_01_02_425006 136 17 were be VBD 10_1101-2021_01_02_425006 136 18 retained retain VBN 10_1101-2021_01_02_425006 136 19 in in IN 10_1101-2021_01_02_425006 136 20 the the DT 10_1101-2021_01_02_425006 136 21 subsequent subsequent JJ 10_1101-2021_01_02_425006 136 22 evaluations evaluation NNS 10_1101-2021_01_02_425006 136 23 ( ( -LRB- 10_1101-2021_01_02_425006 136 24 more more JJR 10_1101-2021_01_02_425006 136 25 details detail NNS 10_1101-2021_01_02_425006 136 26 227 227 CD 10_1101-2021_01_02_425006 136 27 about about IN 10_1101-2021_01_02_425006 136 28 the the DT 10_1101-2021_01_02_425006 136 29 pre pre JJ 10_1101-2021_01_02_425006 136 30 - - NN 10_1101-2021_01_02_425006 136 31 selection selection NN 10_1101-2021_01_02_425006 136 32 of of IN 10_1101-2021_01_02_425006 136 33 maximal maximal JJ 10_1101-2021_01_02_425006 136 34 ATU ATU NNP 10_1101-2021_01_02_425006 136 35 clusters cluster NNS 10_1101-2021_01_02_425006 136 36 can can MD 10_1101-2021_01_02_425006 136 37 be be VB 10_1101-2021_01_02_425006 136 38 seen see VBN 10_1101-2021_01_02_425006 136 39 in in IN 10_1101-2021_01_02_425006 136 40 method method NN 10_1101-2021_01_02_425006 136 41 S7 S7 NNP 10_1101-2021_01_02_425006 136 42 and and CC 10_1101-2021_01_02_425006 136 43 fig fig VBP 10_1101-2021_01_02_425006 136 44 . . . 10_1101-2021_01_02_425006 137 1 S3 S3 NNP 10_1101-2021_01_02_425006 137 2 ) ) -RRB- 10_1101-2021_01_02_425006 137 3 . . . 10_1101-2021_01_02_425006 138 1 228 228 CD 10_1101-2021_01_02_425006 138 2 The the DT 10_1101-2021_01_02_425006 138 3 precision precision NN 10_1101-2021_01_02_425006 138 4 and and CC 10_1101-2021_01_02_425006 138 5 recall recall NN 10_1101-2021_01_02_425006 138 6 of of IN 10_1101-2021_01_02_425006 138 7 the the DT 10_1101-2021_01_02_425006 138 8 predicted predict VBN 10_1101-2021_01_02_425006 138 9 ATUs atu NNS 10_1101-2021_01_02_425006 138 10 were be VBD 10_1101-2021_01_02_425006 138 11 calculated calculate VBN 10_1101-2021_01_02_425006 138 12 for for IN 10_1101-2021_01_02_425006 138 13 each each DT 10_1101-2021_01_02_425006 138 14 maximal maximal JJ 10_1101-2021_01_02_425006 138 15 ATU ATU NNP 10_1101-2021_01_02_425006 138 16 cluster cluster NN 10_1101-2021_01_02_425006 138 17 . . . 10_1101-2021_01_02_425006 139 1 By by IN 10_1101-2021_01_02_425006 139 2 229 229 CD 10_1101-2021_01_02_425006 139 3 considering consider VBG 10_1101-2021_01_02_425006 139 4 only only RB 10_1101-2021_01_02_425006 139 5 perfect perfect JJ 10_1101-2021_01_02_425006 139 6 matching matching NN 10_1101-2021_01_02_425006 139 7 , , , 10_1101-2021_01_02_425006 139 8 the the DT 10_1101-2021_01_02_425006 139 9 average average JJ 10_1101-2021_01_02_425006 139 10 precision precision NN 10_1101-2021_01_02_425006 139 11 and and CC 10_1101-2021_01_02_425006 139 12 recall recall NN 10_1101-2021_01_02_425006 139 13 were be VBD 10_1101-2021_01_02_425006 139 14 0.67 0.67 CD 10_1101-2021_01_02_425006 139 15 and and CC 10_1101-2021_01_02_425006 139 16 0.67 0.67 CD 10_1101-2021_01_02_425006 139 17 for for IN 10_1101-2021_01_02_425006 139 18 230 230 CD 10_1101-2021_01_02_425006 139 19 M9Enirch_Seq m9enirch_seq CD 10_1101-2021_01_02_425006 139 20 and and CC 10_1101-2021_01_02_425006 139 21 0.64 0.64 CD 10_1101-2021_01_02_425006 139 22 and and CC 10_1101-2021_01_02_425006 139 23 0.68 0.68 CD 10_1101-2021_01_02_425006 139 24 for for IN 10_1101-2021_01_02_425006 139 25 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 139 26 , , , 10_1101-2021_01_02_425006 139 27 respectively respectively RB 10_1101-2021_01_02_425006 139 28 . . . 10_1101-2021_01_02_425006 140 1 When when WRB 10_1101-2021_01_02_425006 140 2 using use VBG 10_1101-2021_01_02_425006 140 3 relaxed relaxed JJ 10_1101-2021_01_02_425006 140 4 matching matching NN 10_1101-2021_01_02_425006 140 5 , , , 10_1101-2021_01_02_425006 140 6 the the DT 10_1101-2021_01_02_425006 140 7 231 231 CD 10_1101-2021_01_02_425006 140 8 average average JJ 10_1101-2021_01_02_425006 140 9 precision precision NN 10_1101-2021_01_02_425006 140 10 and and CC 10_1101-2021_01_02_425006 140 11 recall recall NN 10_1101-2021_01_02_425006 140 12 increased increase VBD 10_1101-2021_01_02_425006 140 13 to to IN 10_1101-2021_01_02_425006 140 14 0.77 0.77 CD 10_1101-2021_01_02_425006 140 15 and and CC 10_1101-2021_01_02_425006 140 16 0.75 0.75 CD 10_1101-2021_01_02_425006 140 17 for for IN 10_1101-2021_01_02_425006 140 18 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 140 19 and and CC 10_1101-2021_01_02_425006 140 20 0.74 0.74 CD 10_1101-2021_01_02_425006 140 21 and and CC 10_1101-2021_01_02_425006 140 22 0.76 0.76 CD 10_1101-2021_01_02_425006 140 23 for for IN 10_1101-2021_01_02_425006 140 24 232 232 CD 10_1101-2021_01_02_425006 140 25 RiEnrich_Seq rienrich_seq NN 10_1101-2021_01_02_425006 140 26 , , , 10_1101-2021_01_02_425006 140 27 respectively respectively RB 10_1101-2021_01_02_425006 140 28 . . . 10_1101-2021_01_02_425006 141 1 The the DT 10_1101-2021_01_02_425006 141 2 statistics statistic NNS 10_1101-2021_01_02_425006 141 3 for for IN 10_1101-2021_01_02_425006 141 4 precision precision NN 10_1101-2021_01_02_425006 141 5 and and CC 10_1101-2021_01_02_425006 141 6 recall recall NN 10_1101-2021_01_02_425006 141 7 on on IN 10_1101-2021_01_02_425006 141 8 maximal maximal JJ 10_1101-2021_01_02_425006 141 9 ATU ATU NNP 10_1101-2021_01_02_425006 141 10 clusters cluster NNS 10_1101-2021_01_02_425006 141 11 with with IN 10_1101-2021_01_02_425006 141 12 233 233 CD 10_1101-2021_01_02_425006 141 13 different different JJ 10_1101-2021_01_02_425006 141 14 sizes size NNS 10_1101-2021_01_02_425006 141 15 , , , 10_1101-2021_01_02_425006 141 16 as as IN 10_1101-2021_01_02_425006 141 17 shown show VBN 10_1101-2021_01_02_425006 141 18 in in IN 10_1101-2021_01_02_425006 141 19 Fig Fig NNP 10_1101-2021_01_02_425006 141 20 . . . 10_1101-2021_01_02_425006 142 1 3A 3A NNP 10_1101-2021_01_02_425006 142 2 and and CC 10_1101-2021_01_02_425006 142 3 fig fig NNP 10_1101-2021_01_02_425006 142 4 . . . 10_1101-2021_01_02_425006 143 1 S4A. s4a. NN 10_1101-2021_01_02_425006 144 1 These these DT 10_1101-2021_01_02_425006 144 2 results result NNS 10_1101-2021_01_02_425006 144 3 showed show VBD 10_1101-2021_01_02_425006 144 4 that that IN 10_1101-2021_01_02_425006 144 5 the the DT 10_1101-2021_01_02_425006 144 6 average average JJ 10_1101-2021_01_02_425006 144 7 precision precision NN 10_1101-2021_01_02_425006 144 8 and and CC 10_1101-2021_01_02_425006 144 9 234 234 CD 10_1101-2021_01_02_425006 144 10 recall recall NN 10_1101-2021_01_02_425006 144 11 were be VBD 10_1101-2021_01_02_425006 144 12 decreasing decrease VBG 10_1101-2021_01_02_425006 144 13 with with IN 10_1101-2021_01_02_425006 144 14 the the DT 10_1101-2021_01_02_425006 144 15 increasing increase VBG 10_1101-2021_01_02_425006 144 16 size size NN 10_1101-2021_01_02_425006 144 17 of of IN 10_1101-2021_01_02_425006 144 18 maximal maximal JJ 10_1101-2021_01_02_425006 144 19 ATU ATU NNP 10_1101-2021_01_02_425006 144 20 clusters cluster NNS 10_1101-2021_01_02_425006 144 21 ( ( -LRB- 10_1101-2021_01_02_425006 144 22 other other JJ 10_1101-2021_01_02_425006 144 23 than than IN 10_1101-2021_01_02_425006 144 24 several several JJ 10_1101-2021_01_02_425006 144 25 large large JJ 10_1101-2021_01_02_425006 144 26 size size NN 10_1101-2021_01_02_425006 144 27 235 235 CD 10_1101-2021_01_02_425006 144 28 ones one NNS 10_1101-2021_01_02_425006 144 29 due due JJ 10_1101-2021_01_02_425006 144 30 to to IN 10_1101-2021_01_02_425006 144 31 their -PRON- PRP$ 10_1101-2021_01_02_425006 144 32 small small JJ 10_1101-2021_01_02_425006 144 33 number number NN 10_1101-2021_01_02_425006 144 34 of of IN 10_1101-2021_01_02_425006 144 35 counts count NNS 10_1101-2021_01_02_425006 144 36 ) ) -RRB- 10_1101-2021_01_02_425006 144 37 . . . 10_1101-2021_01_02_425006 145 1 The the DT 10_1101-2021_01_02_425006 145 2 results result NNS 10_1101-2021_01_02_425006 145 3 also also RB 10_1101-2021_01_02_425006 145 4 indicated indicate VBD 10_1101-2021_01_02_425006 145 5 that that IN 10_1101-2021_01_02_425006 145 6 the the DT 10_1101-2021_01_02_425006 145 7 evaluation evaluation NN 10_1101-2021_01_02_425006 145 8 results result NNS 10_1101-2021_01_02_425006 145 9 based base VBN 10_1101-2021_01_02_425006 145 10 on on IN 10_1101-2021_01_02_425006 145 11 236 236 CD 10_1101-2021_01_02_425006 145 12 relaxed relaxed JJ 10_1101-2021_01_02_425006 145 13 matching matching NN 10_1101-2021_01_02_425006 145 14 were be VBD 10_1101-2021_01_02_425006 145 15 significantly significantly RB 10_1101-2021_01_02_425006 145 16 higher high JJR 10_1101-2021_01_02_425006 145 17 than than IN 10_1101-2021_01_02_425006 145 18 those those DT 10_1101-2021_01_02_425006 145 19 based base VBN 10_1101-2021_01_02_425006 145 20 on on IN 10_1101-2021_01_02_425006 145 21 perfect perfect JJ 10_1101-2021_01_02_425006 145 22 matching matching NN 10_1101-2021_01_02_425006 145 23 across across IN 10_1101-2021_01_02_425006 145 24 different different JJ 10_1101-2021_01_02_425006 145 25 sizes size NNS 10_1101-2021_01_02_425006 145 26 . . . 10_1101-2021_01_02_425006 146 1 237 237 CD 10_1101-2021_01_02_425006 146 2 This this DT 10_1101-2021_01_02_425006 146 3 result result NN 10_1101-2021_01_02_425006 146 4 implied imply VBD 10_1101-2021_01_02_425006 146 5 that that IN 10_1101-2021_01_02_425006 146 6 the the DT 10_1101-2021_01_02_425006 146 7 incorrectly incorrectly RB 10_1101-2021_01_02_425006 146 8 predicted predict VBN 10_1101-2021_01_02_425006 146 9 ATUs atu NNS 10_1101-2021_01_02_425006 146 10 by by IN 10_1101-2021_01_02_425006 146 11 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 146 12 based base VBN 10_1101-2021_01_02_425006 146 13 on on IN 10_1101-2021_01_02_425006 146 14 perfect perfect JJ 10_1101-2021_01_02_425006 146 15 matching matching NN 10_1101-2021_01_02_425006 146 16 tended tend VBD 10_1101-2021_01_02_425006 146 17 to to IN 10_1101-2021_01_02_425006 146 18 238 238 CD 10_1101-2021_01_02_425006 146 19 have have VB 10_1101-2021_01_02_425006 146 20 strong strong JJ 10_1101-2021_01_02_425006 146 21 similarities similarity NNS 10_1101-2021_01_02_425006 146 22 with with IN 10_1101-2021_01_02_425006 146 23 the the DT 10_1101-2021_01_02_425006 146 24 ATUs atu NNS 10_1101-2021_01_02_425006 146 25 in in IN 10_1101-2021_01_02_425006 146 26 the the DT 10_1101-2021_01_02_425006 146 27 evaluation evaluation NN 10_1101-2021_01_02_425006 146 28 data datum NNS 10_1101-2021_01_02_425006 146 29 . . . 10_1101-2021_01_02_425006 147 1 In in IN 10_1101-2021_01_02_425006 147 2 addition addition NN 10_1101-2021_01_02_425006 147 3 , , , 10_1101-2021_01_02_425006 147 4 we -PRON- PRP 10_1101-2021_01_02_425006 147 5 also also RB 10_1101-2021_01_02_425006 147 6 found find VBD 10_1101-2021_01_02_425006 147 7 that that IN 10_1101-2021_01_02_425006 147 8 more more JJR 10_1101-2021_01_02_425006 147 9 than than IN 10_1101-2021_01_02_425006 147 10 a a DT 10_1101-2021_01_02_425006 147 11 239 239 CD 10_1101-2021_01_02_425006 147 12 quarter quarter NN 10_1101-2021_01_02_425006 147 13 of of IN 10_1101-2021_01_02_425006 147 14 the the DT 10_1101-2021_01_02_425006 147 15 incorrectly incorrectly RB 10_1101-2021_01_02_425006 147 16 predicted predict VBN 10_1101-2021_01_02_425006 147 17 ATUs atu NNS 10_1101-2021_01_02_425006 147 18 ( ( -LRB- 10_1101-2021_01_02_425006 147 19 25%/29 25%/29 CD 10_1101-2021_01_02_425006 147 20 % % NN 10_1101-2021_01_02_425006 147 21 for for IN 10_1101-2021_01_02_425006 147 22 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 147 23 / / SYM 10_1101-2021_01_02_425006 147 24 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 147 25 ) ) -RRB- 10_1101-2021_01_02_425006 147 26 by by IN 10_1101-2021_01_02_425006 147 27 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 147 28 240 240 CD 10_1101-2021_01_02_425006 147 29 .CC .CC : 10_1101-2021_01_02_425006 147 30 - - : 10_1101-2021_01_02_425006 147 31 BY by IN 10_1101-2021_01_02_425006 147 32 - - HYPH 10_1101-2021_01_02_425006 147 33 NC NC NNP 10_1101-2021_01_02_425006 147 34 - - HYPH 10_1101-2021_01_02_425006 147 35 ND ND NNP 10_1101-2021_01_02_425006 147 36 4.0 4.0 CD 10_1101-2021_01_02_425006 147 37 International International NNP 10_1101-2021_01_02_425006 147 38 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 147 39 under under IN 10_1101-2021_01_02_425006 147 40 a a DT 10_1101-2021_01_02_425006 147 41 ( ( -LRB- 10_1101-2021_01_02_425006 147 42 which which WDT 10_1101-2021_01_02_425006 147 43 was be VBD 10_1101-2021_01_02_425006 147 44 not not RB 10_1101-2021_01_02_425006 147 45 certified certify VBN 10_1101-2021_01_02_425006 147 46 by by IN 10_1101-2021_01_02_425006 147 47 peer peer NN 10_1101-2021_01_02_425006 147 48 review review NN 10_1101-2021_01_02_425006 147 49 ) ) -RRB- 10_1101-2021_01_02_425006 147 50 is be VBZ 10_1101-2021_01_02_425006 147 51 the the DT 10_1101-2021_01_02_425006 147 52 author author NN 10_1101-2021_01_02_425006 147 53 / / SYM 10_1101-2021_01_02_425006 147 54 funder funder NN 10_1101-2021_01_02_425006 147 55 , , , 10_1101-2021_01_02_425006 147 56 who who WP 10_1101-2021_01_02_425006 147 57 has have VBZ 10_1101-2021_01_02_425006 147 58 granted grant VBN 10_1101-2021_01_02_425006 147 59 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 147 60 a a DT 10_1101-2021_01_02_425006 147 61 license license NN 10_1101-2021_01_02_425006 147 62 to to TO 10_1101-2021_01_02_425006 147 63 display display VB 10_1101-2021_01_02_425006 147 64 the the DT 10_1101-2021_01_02_425006 147 65 preprint preprint NN 10_1101-2021_01_02_425006 147 66 in in IN 10_1101-2021_01_02_425006 147 67 perpetuity perpetuity NN 10_1101-2021_01_02_425006 147 68 . . . 10_1101-2021_01_02_425006 148 1 It -PRON- PRP 10_1101-2021_01_02_425006 148 2 is be VBZ 10_1101-2021_01_02_425006 148 3 made make VBN 10_1101-2021_01_02_425006 148 4 The the DT 10_1101-2021_01_02_425006 148 5 copyright copyright NN 10_1101-2021_01_02_425006 148 6 holder holder NN 10_1101-2021_01_02_425006 148 7 for for IN 10_1101-2021_01_02_425006 148 8 this this DT 10_1101-2021_01_02_425006 148 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 148 10 version version NN 10_1101-2021_01_02_425006 148 11 posted post VBD 10_1101-2021_01_02_425006 148 12 January January NNP 10_1101-2021_01_02_425006 148 13 6 6 CD 10_1101-2021_01_02_425006 148 14 , , , 10_1101-2021_01_02_425006 148 15 2021 2021 CD 10_1101-2021_01_02_425006 148 16 . . . 10_1101-2021_01_02_425006 148 17 ; ; : 10_1101-2021_01_02_425006 148 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 148 19 : : : 10_1101-2021_01_02_425006 148 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 148 21 preprint preprint NN 10_1101-2021_01_02_425006 148 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 148 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 148 24 15 15 CD 10_1101-2021_01_02_425006 148 25 based base VBN 10_1101-2021_01_02_425006 148 26 on on IN 10_1101-2021_01_02_425006 148 27 perfect perfect JJ 10_1101-2021_01_02_425006 148 28 matching matching NN 10_1101-2021_01_02_425006 148 29 matched match VBN 10_1101-2021_01_02_425006 148 30 with with IN 10_1101-2021_01_02_425006 148 31 the the DT 10_1101-2021_01_02_425006 148 32 transcription transcription NN 10_1101-2021_01_02_425006 148 33 units unit NNS 10_1101-2021_01_02_425006 148 34 in in IN 10_1101-2021_01_02_425006 148 35 RegulonDB RegulonDB NNP 10_1101-2021_01_02_425006 148 36 ( ( -LRB- 10_1101-2021_01_02_425006 148 37 19 19 CD 10_1101-2021_01_02_425006 148 38 ) ) -RRB- 10_1101-2021_01_02_425006 148 39 . . . 10_1101-2021_01_02_425006 149 1 241 241 CD 10_1101-2021_01_02_425006 149 2 The the DT 10_1101-2021_01_02_425006 149 3 two two CD 10_1101-2021_01_02_425006 149 4 evaluation evaluation NN 10_1101-2021_01_02_425006 149 5 datasets dataset NNS 10_1101-2021_01_02_425006 149 6 ( ( -LRB- 10_1101-2021_01_02_425006 149 7 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 149 8 and and CC 10_1101-2021_01_02_425006 149 9 SMRT_RiEnrich smrt_rienrich SYM 10_1101-2021_01_02_425006 149 10 ) ) -RRB- 10_1101-2021_01_02_425006 149 11 were be VBD 10_1101-2021_01_02_425006 149 12 both both DT 10_1101-2021_01_02_425006 149 13 from from IN 10_1101-2021_01_02_425006 149 14 SMRT-242 SMRT-242 NNP 10_1101-2021_01_02_425006 149 15 Cappable cappable JJ 10_1101-2021_01_02_425006 149 16 - - HYPH 10_1101-2021_01_02_425006 149 17 seq seq NN 10_1101-2021_01_02_425006 149 18 , , , 10_1101-2021_01_02_425006 149 19 while while IN 10_1101-2021_01_02_425006 149 20 one one CD 10_1101-2021_01_02_425006 149 21 of of IN 10_1101-2021_01_02_425006 149 22 the the DT 10_1101-2021_01_02_425006 149 23 processing processing NN 10_1101-2021_01_02_425006 149 24 steps step NNS 10_1101-2021_01_02_425006 149 25 of of IN 10_1101-2021_01_02_425006 149 26 the the DT 10_1101-2021_01_02_425006 149 27 technique technique NN 10_1101-2021_01_02_425006 149 28 filtered filter VBN 10_1101-2021_01_02_425006 149 29 RNA RNA NNP 10_1101-2021_01_02_425006 149 30 reads read VBZ 10_1101-2021_01_02_425006 149 31 smaller small JJR 10_1101-2021_01_02_425006 149 32 than than IN 10_1101-2021_01_02_425006 149 33 1,000 1,000 CD 10_1101-2021_01_02_425006 149 34 243 243 CD 10_1101-2021_01_02_425006 149 35 bp bp NNS 10_1101-2021_01_02_425006 149 36 ( ( -LRB- 10_1101-2021_01_02_425006 149 37 6 6 CD 10_1101-2021_01_02_425006 149 38 ) ) -RRB- 10_1101-2021_01_02_425006 149 39 , , , 10_1101-2021_01_02_425006 149 40 which which WDT 10_1101-2021_01_02_425006 149 41 indicated indicate VBD 10_1101-2021_01_02_425006 149 42 that that IN 10_1101-2021_01_02_425006 149 43 the the DT 10_1101-2021_01_02_425006 149 44 ATUs atu NNS 10_1101-2021_01_02_425006 149 45 in in IN 10_1101-2021_01_02_425006 149 46 these these DT 10_1101-2021_01_02_425006 149 47 two two CD 10_1101-2021_01_02_425006 149 48 evaluation evaluation NN 10_1101-2021_01_02_425006 149 49 datasets dataset NNS 10_1101-2021_01_02_425006 149 50 were be VBD 10_1101-2021_01_02_425006 149 51 not not RB 10_1101-2021_01_02_425006 149 52 comprehensive comprehensive JJ 10_1101-2021_01_02_425006 149 53 . . . 10_1101-2021_01_02_425006 150 1 To to IN 10_1101-2021_01_02_425006 150 2 244 244 CD 10_1101-2021_01_02_425006 150 3 address address NN 10_1101-2021_01_02_425006 150 4 this this DT 10_1101-2021_01_02_425006 150 5 issue issue NN 10_1101-2021_01_02_425006 150 6 , , , 10_1101-2021_01_02_425006 150 7 we -PRON- PRP 10_1101-2021_01_02_425006 150 8 enriched enrich VBD 10_1101-2021_01_02_425006 150 9 the the DT 10_1101-2021_01_02_425006 150 10 evaluation evaluation NN 10_1101-2021_01_02_425006 150 11 data datum NNS 10_1101-2021_01_02_425006 150 12 by by IN 10_1101-2021_01_02_425006 150 13 adding add VBG 10_1101-2021_01_02_425006 150 14 the the DT 10_1101-2021_01_02_425006 150 15 ATUs atu NNS 10_1101-2021_01_02_425006 150 16 defined define VBN 10_1101-2021_01_02_425006 150 17 by by IN 10_1101-2021_01_02_425006 150 18 SEnd SEnd NNP 10_1101-2021_01_02_425006 150 19 - - HYPH 10_1101-2021_01_02_425006 150 20 seq seq NNP 10_1101-2021_01_02_425006 150 21 ( ( -LRB- 10_1101-2021_01_02_425006 150 22 7 7 CD 10_1101-2021_01_02_425006 150 23 ) ) -RRB- 10_1101-2021_01_02_425006 150 24 , , , 10_1101-2021_01_02_425006 150 25 as as IN 10_1101-2021_01_02_425006 150 26 245 245 CD 10_1101-2021_01_02_425006 150 27 SEnd SEnd NNP 10_1101-2021_01_02_425006 150 28 - - HYPH 10_1101-2021_01_02_425006 150 29 seq seq NNP 10_1101-2021_01_02_425006 150 30 did do VBD 10_1101-2021_01_02_425006 150 31 not not RB 10_1101-2021_01_02_425006 150 32 introduce introduce VB 10_1101-2021_01_02_425006 150 33 any any DT 10_1101-2021_01_02_425006 150 34 filtering filtering NN 10_1101-2021_01_02_425006 150 35 based base VBN 10_1101-2021_01_02_425006 150 36 on on IN 10_1101-2021_01_02_425006 150 37 RNA RNA NNP 10_1101-2021_01_02_425006 150 38 size size NN 10_1101-2021_01_02_425006 150 39 . . . 10_1101-2021_01_02_425006 151 1 When when WRB 10_1101-2021_01_02_425006 151 2 we -PRON- PRP 10_1101-2021_01_02_425006 151 3 used use VBD 10_1101-2021_01_02_425006 151 4 the the DT 10_1101-2021_01_02_425006 151 5 new new JJ 10_1101-2021_01_02_425006 151 6 evaluation evaluation NN 10_1101-2021_01_02_425006 151 7 data datum NNS 10_1101-2021_01_02_425006 151 8 , , , 10_1101-2021_01_02_425006 151 9 the the DT 10_1101-2021_01_02_425006 151 10 246 246 CD 10_1101-2021_01_02_425006 151 11 ATUs atu NNS 10_1101-2021_01_02_425006 151 12 predicted predict VBN 10_1101-2021_01_02_425006 151 13 by by IN 10_1101-2021_01_02_425006 151 14 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 151 15 improved improve VBN 10_1101-2021_01_02_425006 151 16 by by IN 10_1101-2021_01_02_425006 151 17 15 15 CD 10_1101-2021_01_02_425006 151 18 % % NN 10_1101-2021_01_02_425006 151 19 ( ( -LRB- 10_1101-2021_01_02_425006 151 20 0.77 0.77 CD 10_1101-2021_01_02_425006 151 21 ) ) -RRB- 10_1101-2021_01_02_425006 151 22 and and CC 10_1101-2021_01_02_425006 151 23 19 19 CD 10_1101-2021_01_02_425006 151 24 % % NN 10_1101-2021_01_02_425006 151 25 ( ( -LRB- 10_1101-2021_01_02_425006 151 26 0.76 0.76 CD 10_1101-2021_01_02_425006 151 27 ) ) -RRB- 10_1101-2021_01_02_425006 151 28 in in IN 10_1101-2021_01_02_425006 151 29 terms term NNS 10_1101-2021_01_02_425006 151 30 of of IN 10_1101-2021_01_02_425006 151 31 the the DT 10_1101-2021_01_02_425006 151 32 average average JJ 10_1101-2021_01_02_425006 151 33 precision precision NN 10_1101-2021_01_02_425006 151 34 247 247 CD 10_1101-2021_01_02_425006 151 35 based base VBN 10_1101-2021_01_02_425006 151 36 on on IN 10_1101-2021_01_02_425006 151 37 perfect perfect JJ 10_1101-2021_01_02_425006 151 38 matching matching NN 10_1101-2021_01_02_425006 151 39 for for IN 10_1101-2021_01_02_425006 151 40 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 151 41 and and CC 10_1101-2021_01_02_425006 151 42 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 151 43 , , , 10_1101-2021_01_02_425006 151 44 respectively respectively RB 10_1101-2021_01_02_425006 151 45 , , , 10_1101-2021_01_02_425006 151 46 and and CC 10_1101-2021_01_02_425006 151 47 by by IN 10_1101-2021_01_02_425006 151 48 9 9 CD 10_1101-2021_01_02_425006 151 49 % % NN 10_1101-2021_01_02_425006 151 50 ( ( -LRB- 10_1101-2021_01_02_425006 151 51 0.84 0.84 CD 10_1101-2021_01_02_425006 151 52 ) ) -RRB- 10_1101-2021_01_02_425006 151 53 and and CC 10_1101-2021_01_02_425006 151 54 248 248 CD 10_1101-2021_01_02_425006 151 55 12 12 CD 10_1101-2021_01_02_425006 151 56 % % NN 10_1101-2021_01_02_425006 151 57 ( ( -LRB- 10_1101-2021_01_02_425006 151 58 0.83 0.83 CD 10_1101-2021_01_02_425006 151 59 ) ) -RRB- 10_1101-2021_01_02_425006 151 60 based base VBN 10_1101-2021_01_02_425006 151 61 on on IN 10_1101-2021_01_02_425006 151 62 relaxed relaxed JJ 10_1101-2021_01_02_425006 151 63 matching matching NN 10_1101-2021_01_02_425006 151 64 . . . 10_1101-2021_01_02_425006 152 1 The the DT 10_1101-2021_01_02_425006 152 2 statistics statistic NNS 10_1101-2021_01_02_425006 152 3 for for IN 10_1101-2021_01_02_425006 152 4 precision precision NN 10_1101-2021_01_02_425006 152 5 across across IN 10_1101-2021_01_02_425006 152 6 different different JJ 10_1101-2021_01_02_425006 152 7 sizes size NNS 10_1101-2021_01_02_425006 152 8 of of IN 10_1101-2021_01_02_425006 152 9 the the DT 10_1101-2021_01_02_425006 152 10 maximal maximal JJ 10_1101-2021_01_02_425006 152 11 249 249 CD 10_1101-2021_01_02_425006 152 12 ATU ATU NNP 10_1101-2021_01_02_425006 152 13 clusters cluster NNS 10_1101-2021_01_02_425006 152 14 are be VBP 10_1101-2021_01_02_425006 152 15 shown show VBN 10_1101-2021_01_02_425006 152 16 in in IN 10_1101-2021_01_02_425006 152 17 Fig Fig NNP 10_1101-2021_01_02_425006 152 18 . . . 10_1101-2021_01_02_425006 153 1 3B 3B NNP 10_1101-2021_01_02_425006 153 2 and and CC 10_1101-2021_01_02_425006 153 3 fig fig VB 10_1101-2021_01_02_425006 153 4 . . . 10_1101-2021_01_02_425006 154 1 S4B S4B NNP 10_1101-2021_01_02_425006 154 2 , , , 10_1101-2021_01_02_425006 154 3 showing show VBG 10_1101-2021_01_02_425006 154 4 that that IN 10_1101-2021_01_02_425006 154 5 the the DT 10_1101-2021_01_02_425006 154 6 values value NNS 10_1101-2021_01_02_425006 154 7 of of IN 10_1101-2021_01_02_425006 154 8 precision precision NN 10_1101-2021_01_02_425006 154 9 based base VBN 10_1101-2021_01_02_425006 154 10 on on IN 10_1101-2021_01_02_425006 154 11 perfect perfect JJ 10_1101-2021_01_02_425006 154 12 250 250 CD 10_1101-2021_01_02_425006 154 13 matching match VBG 10_1101-2021_01_02_425006 154 14 were be VBD 10_1101-2021_01_02_425006 154 15 significantly significantly RB 10_1101-2021_01_02_425006 154 16 improved improve VBN 10_1101-2021_01_02_425006 154 17 across across IN 10_1101-2021_01_02_425006 154 18 different different JJ 10_1101-2021_01_02_425006 154 19 sizes size NNS 10_1101-2021_01_02_425006 154 20 of of IN 10_1101-2021_01_02_425006 154 21 maximal maximal JJ 10_1101-2021_01_02_425006 154 22 ATU ATU NNP 10_1101-2021_01_02_425006 154 23 clusters cluster NNS 10_1101-2021_01_02_425006 154 24 by by IN 10_1101-2021_01_02_425006 154 25 using use VBG 10_1101-2021_01_02_425006 154 26 the the DT 10_1101-2021_01_02_425006 154 27 251 251 CD 10_1101-2021_01_02_425006 154 28 evaluated evaluate VBN 10_1101-2021_01_02_425006 154 29 ATUs atu NNS 10_1101-2021_01_02_425006 154 30 from from IN 10_1101-2021_01_02_425006 154 31 SMRT SMRT NNP 10_1101-2021_01_02_425006 154 32 - - HYPH 10_1101-2021_01_02_425006 154 33 Cappable cappable JJ 10_1101-2021_01_02_425006 154 34 - - HYPH 10_1101-2021_01_02_425006 154 35 seq seq NN 10_1101-2021_01_02_425006 154 36 and and CC 10_1101-2021_01_02_425006 154 37 SEnd SEnd NNP 10_1101-2021_01_02_425006 154 38 - - HYPH 10_1101-2021_01_02_425006 154 39 seq seq NNP 10_1101-2021_01_02_425006 154 40 . . . 10_1101-2021_01_02_425006 155 1 This this DT 10_1101-2021_01_02_425006 155 2 result result NN 10_1101-2021_01_02_425006 155 3 suggested suggest VBD 10_1101-2021_01_02_425006 155 4 that that IN 10_1101-2021_01_02_425006 155 5 the the DT 10_1101-2021_01_02_425006 155 6 ATUs atu NNS 10_1101-2021_01_02_425006 155 7 we -PRON- PRP 10_1101-2021_01_02_425006 155 8 252 252 CD 10_1101-2021_01_02_425006 155 9 predicted predict VBD 10_1101-2021_01_02_425006 155 10 , , , 10_1101-2021_01_02_425006 155 11 which which WDT 10_1101-2021_01_02_425006 155 12 were be VBD 10_1101-2021_01_02_425006 155 13 not not RB 10_1101-2021_01_02_425006 155 14 in in IN 10_1101-2021_01_02_425006 155 15 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 155 16 and and CC 10_1101-2021_01_02_425006 155 17 SMRT_RiEnrich SMRT_RiEnrich NNP 10_1101-2021_01_02_425006 155 18 , , , 10_1101-2021_01_02_425006 155 19 may may MD 10_1101-2021_01_02_425006 155 20 be be VB 10_1101-2021_01_02_425006 155 21 due due IN 10_1101-2021_01_02_425006 155 22 to to IN 10_1101-2021_01_02_425006 155 23 the the DT 10_1101-2021_01_02_425006 155 24 RNA RNA NNP 10_1101-2021_01_02_425006 155 25 length length NN 10_1101-2021_01_02_425006 155 26 253 253 CD 10_1101-2021_01_02_425006 155 27 selection selection NN 10_1101-2021_01_02_425006 155 28 of of IN 10_1101-2021_01_02_425006 155 29 SMRT SMRT NNP 10_1101-2021_01_02_425006 155 30 - - HYPH 10_1101-2021_01_02_425006 155 31 Cappable cappable JJ 10_1101-2021_01_02_425006 155 32 - - HYPH 10_1101-2021_01_02_425006 155 33 seq seq NN 10_1101-2021_01_02_425006 155 34 . . . 10_1101-2021_01_02_425006 156 1 We -PRON- PRP 10_1101-2021_01_02_425006 156 2 enriched enrich VBD 10_1101-2021_01_02_425006 156 3 the the DT 10_1101-2021_01_02_425006 156 4 evaluation evaluation NN 10_1101-2021_01_02_425006 156 5 data datum NNS 10_1101-2021_01_02_425006 156 6 by by IN 10_1101-2021_01_02_425006 156 7 adding add VBG 10_1101-2021_01_02_425006 156 8 the the DT 10_1101-2021_01_02_425006 156 9 ATUs ATUs NNPS 10_1101-2021_01_02_425006 156 10 in in IN 10_1101-2021_01_02_425006 156 11 RegulonDB regulondb NN 10_1101-2021_01_02_425006 156 12 254 254 CD 10_1101-2021_01_02_425006 156 13 ( ( -LRB- 10_1101-2021_01_02_425006 156 14 19 19 CD 10_1101-2021_01_02_425006 156 15 ) ) -RRB- 10_1101-2021_01_02_425006 156 16 and and CC 10_1101-2021_01_02_425006 156 17 also also RB 10_1101-2021_01_02_425006 156 18 found find VBD 10_1101-2021_01_02_425006 156 19 the the DT 10_1101-2021_01_02_425006 156 20 improvement improvement NN 10_1101-2021_01_02_425006 156 21 of of IN 10_1101-2021_01_02_425006 156 22 precision precision NN 10_1101-2021_01_02_425006 156 23 across across IN 10_1101-2021_01_02_425006 156 24 different different JJ 10_1101-2021_01_02_425006 156 25 sizes size NNS 10_1101-2021_01_02_425006 156 26 of of IN 10_1101-2021_01_02_425006 156 27 maximal maximal JJ 10_1101-2021_01_02_425006 156 28 ATU ATU NNP 10_1101-2021_01_02_425006 156 29 clusters cluster NNS 10_1101-2021_01_02_425006 156 30 for for IN 10_1101-2021_01_02_425006 156 31 255 255 CD 10_1101-2021_01_02_425006 156 32 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 156 33 and and CC 10_1101-2021_01_02_425006 156 34 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 156 35 ( ( -LRB- 10_1101-2021_01_02_425006 156 36 fig fig NN 10_1101-2021_01_02_425006 156 37 . . . 10_1101-2021_01_02_425006 157 1 S4C S4C NNP 10_1101-2021_01_02_425006 157 2 ) ) -RRB- 10_1101-2021_01_02_425006 157 3 . . . 10_1101-2021_01_02_425006 158 1 256 256 CD 10_1101-2021_01_02_425006 158 2 Furthermore furthermore RB 10_1101-2021_01_02_425006 158 3 , , , 10_1101-2021_01_02_425006 158 4 to to TO 10_1101-2021_01_02_425006 158 5 facilitate facilitate VB 10_1101-2021_01_02_425006 158 6 the the DT 10_1101-2021_01_02_425006 158 7 understanding understanding NN 10_1101-2021_01_02_425006 158 8 of of IN 10_1101-2021_01_02_425006 158 9 the the DT 10_1101-2021_01_02_425006 158 10 performance performance NN 10_1101-2021_01_02_425006 158 11 of of IN 10_1101-2021_01_02_425006 158 12 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 158 13 and and CC 10_1101-2021_01_02_425006 158 14 to to TO 10_1101-2021_01_02_425006 158 15 measure measure VB 10_1101-2021_01_02_425006 158 16 the the DT 10_1101-2021_01_02_425006 158 17 257 257 CD 10_1101-2021_01_02_425006 158 18 influence influence NN 10_1101-2021_01_02_425006 158 19 of of IN 10_1101-2021_01_02_425006 158 20 the the DT 10_1101-2021_01_02_425006 158 21 maximal maximal JJ 10_1101-2021_01_02_425006 158 22 ATU ATU NNP 10_1101-2021_01_02_425006 158 23 clusters cluster NNS 10_1101-2021_01_02_425006 158 24 from from IN 10_1101-2021_01_02_425006 158 25 rSeqTU rseqtu CD 10_1101-2021_01_02_425006 158 26 on on IN 10_1101-2021_01_02_425006 158 27 our -PRON- PRP$ 10_1101-2021_01_02_425006 158 28 ATU ATU NNP 10_1101-2021_01_02_425006 158 29 prediction prediction NN 10_1101-2021_01_02_425006 158 30 method method NN 10_1101-2021_01_02_425006 158 31 , , , 10_1101-2021_01_02_425006 158 32 SMRT SMRT NNP 10_1101-2021_01_02_425006 158 33 maximal maximal JJ 10_1101-2021_01_02_425006 158 34 258 258 CD 10_1101-2021_01_02_425006 158 35 ATU ATU NNP 10_1101-2021_01_02_425006 158 36 clusters cluster NNS 10_1101-2021_01_02_425006 158 37 collected collect VBN 10_1101-2021_01_02_425006 158 38 from from IN 10_1101-2021_01_02_425006 158 39 SMRT_M9Enrich SMRT_M9Enrich NNP 10_1101-2021_01_02_425006 158 40 and and CC 10_1101-2021_01_02_425006 158 41 SMRT_RiEnrich SMRT_RiEnrich NNP 10_1101-2021_01_02_425006 158 42 ( ( -LRB- 10_1101-2021_01_02_425006 158 43 for for IN 10_1101-2021_01_02_425006 158 44 more more JJR 10_1101-2021_01_02_425006 158 45 details detail NNS 10_1101-2021_01_02_425006 158 46 , , , 10_1101-2021_01_02_425006 158 47 see see VB 10_1101-2021_01_02_425006 158 48 method method NNP 10_1101-2021_01_02_425006 158 49 S8 S8 NNP 10_1101-2021_01_02_425006 158 50 ) ) -RRB- 10_1101-2021_01_02_425006 158 51 259 259 CD 10_1101-2021_01_02_425006 158 52 .CC .CC : 10_1101-2021_01_02_425006 158 53 - - HYPH 10_1101-2021_01_02_425006 158 54 BY by IN 10_1101-2021_01_02_425006 158 55 - - HYPH 10_1101-2021_01_02_425006 158 56 NC NC NNP 10_1101-2021_01_02_425006 158 57 - - HYPH 10_1101-2021_01_02_425006 158 58 ND ND NNP 10_1101-2021_01_02_425006 158 59 4.0 4.0 CD 10_1101-2021_01_02_425006 158 60 International International NNP 10_1101-2021_01_02_425006 158 61 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 158 62 under under IN 10_1101-2021_01_02_425006 158 63 a a DT 10_1101-2021_01_02_425006 158 64 ( ( -LRB- 10_1101-2021_01_02_425006 158 65 which which WDT 10_1101-2021_01_02_425006 158 66 was be VBD 10_1101-2021_01_02_425006 158 67 not not RB 10_1101-2021_01_02_425006 158 68 certified certify VBN 10_1101-2021_01_02_425006 158 69 by by IN 10_1101-2021_01_02_425006 158 70 peer peer NN 10_1101-2021_01_02_425006 158 71 review review NN 10_1101-2021_01_02_425006 158 72 ) ) -RRB- 10_1101-2021_01_02_425006 158 73 is be VBZ 10_1101-2021_01_02_425006 158 74 the the DT 10_1101-2021_01_02_425006 158 75 author author NN 10_1101-2021_01_02_425006 158 76 / / SYM 10_1101-2021_01_02_425006 158 77 funder funder NN 10_1101-2021_01_02_425006 158 78 , , , 10_1101-2021_01_02_425006 158 79 who who WP 10_1101-2021_01_02_425006 158 80 has have VBZ 10_1101-2021_01_02_425006 158 81 granted grant VBN 10_1101-2021_01_02_425006 158 82 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 158 83 a a DT 10_1101-2021_01_02_425006 158 84 license license NN 10_1101-2021_01_02_425006 158 85 to to TO 10_1101-2021_01_02_425006 158 86 display display VB 10_1101-2021_01_02_425006 158 87 the the DT 10_1101-2021_01_02_425006 158 88 preprint preprint NN 10_1101-2021_01_02_425006 158 89 in in IN 10_1101-2021_01_02_425006 158 90 perpetuity perpetuity NN 10_1101-2021_01_02_425006 158 91 . . . 10_1101-2021_01_02_425006 159 1 It -PRON- PRP 10_1101-2021_01_02_425006 159 2 is be VBZ 10_1101-2021_01_02_425006 159 3 made make VBN 10_1101-2021_01_02_425006 159 4 The the DT 10_1101-2021_01_02_425006 159 5 copyright copyright NN 10_1101-2021_01_02_425006 159 6 holder holder NN 10_1101-2021_01_02_425006 159 7 for for IN 10_1101-2021_01_02_425006 159 8 this this DT 10_1101-2021_01_02_425006 159 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 159 10 version version NN 10_1101-2021_01_02_425006 159 11 posted post VBD 10_1101-2021_01_02_425006 159 12 January January NNP 10_1101-2021_01_02_425006 159 13 6 6 CD 10_1101-2021_01_02_425006 159 14 , , , 10_1101-2021_01_02_425006 159 15 2021 2021 CD 10_1101-2021_01_02_425006 159 16 . . . 10_1101-2021_01_02_425006 159 17 ; ; : 10_1101-2021_01_02_425006 159 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 159 19 : : : 10_1101-2021_01_02_425006 159 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 159 21 preprint preprint NN 10_1101-2021_01_02_425006 159 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 159 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 159 24 16 16 CD 10_1101-2021_01_02_425006 159 25 were be VBD 10_1101-2021_01_02_425006 159 26 applied apply VBN 10_1101-2021_01_02_425006 159 27 for for IN 10_1101-2021_01_02_425006 159 28 the the DT 10_1101-2021_01_02_425006 159 29 CQP CQP NNP 10_1101-2021_01_02_425006 159 30 in in IN 10_1101-2021_01_02_425006 159 31 two two CD 10_1101-2021_01_02_425006 159 32 conditions condition NNS 10_1101-2021_01_02_425006 159 33 ( ( -LRB- 10_1101-2021_01_02_425006 159 34 M9 M9 NNP 10_1101-2021_01_02_425006 159 35 minimal minimal JJ 10_1101-2021_01_02_425006 159 36 medium medium NN 10_1101-2021_01_02_425006 159 37 and and CC 10_1101-2021_01_02_425006 159 38 Rich rich JJ 10_1101-2021_01_02_425006 159 39 medium medium NN 10_1101-2021_01_02_425006 159 40 ) ) -RRB- 10_1101-2021_01_02_425006 159 41 . . . 10_1101-2021_01_02_425006 160 1 We -PRON- PRP 10_1101-2021_01_02_425006 160 2 found find VBD 10_1101-2021_01_02_425006 160 3 that that IN 10_1101-2021_01_02_425006 160 4 260 260 CD 10_1101-2021_01_02_425006 160 5 precision precision NN 10_1101-2021_01_02_425006 160 6 and and CC 10_1101-2021_01_02_425006 160 7 recall recall NN 10_1101-2021_01_02_425006 160 8 increased increase VBD 10_1101-2021_01_02_425006 160 9 to to IN 10_1101-2021_01_02_425006 160 10 0.73 0.73 CD 10_1101-2021_01_02_425006 160 11 and and CC 10_1101-2021_01_02_425006 160 12 0.77 0.77 CD 10_1101-2021_01_02_425006 160 13 for for IN 10_1101-2021_01_02_425006 160 14 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 160 15 , , , 10_1101-2021_01_02_425006 160 16 respectively respectively RB 10_1101-2021_01_02_425006 160 17 , , , 10_1101-2021_01_02_425006 160 18 and and CC 10_1101-2021_01_02_425006 160 19 0.69 0.69 CD 10_1101-2021_01_02_425006 160 20 and and CC 10_1101-2021_01_02_425006 160 21 0.80 0.80 CD 10_1101-2021_01_02_425006 160 22 for for IN 10_1101-2021_01_02_425006 160 23 261 261 CD 10_1101-2021_01_02_425006 160 24 RiEnrich_Seq rienrich_seq NN 10_1101-2021_01_02_425006 160 25 based base VBN 10_1101-2021_01_02_425006 160 26 on on IN 10_1101-2021_01_02_425006 160 27 perfect perfect JJ 10_1101-2021_01_02_425006 160 28 matching matching NN 10_1101-2021_01_02_425006 160 29 ( ( -LRB- 10_1101-2021_01_02_425006 160 30 fig fig NN 10_1101-2021_01_02_425006 160 31 . . . 10_1101-2021_01_02_425006 161 1 S4D S4D NNP 10_1101-2021_01_02_425006 161 2 ) ) -RRB- 10_1101-2021_01_02_425006 161 3 . . . 10_1101-2021_01_02_425006 162 1 Additionally additionally RB 10_1101-2021_01_02_425006 162 2 , , , 10_1101-2021_01_02_425006 162 3 when when WRB 10_1101-2021_01_02_425006 162 4 using use VBG 10_1101-2021_01_02_425006 162 5 relaxed relaxed JJ 10_1101-2021_01_02_425006 162 6 matching matching NN 10_1101-2021_01_02_425006 162 7 , , , 10_1101-2021_01_02_425006 162 8 262 262 CD 10_1101-2021_01_02_425006 162 9 precision precision NN 10_1101-2021_01_02_425006 162 10 and and CC 10_1101-2021_01_02_425006 162 11 recall recall NN 10_1101-2021_01_02_425006 162 12 significantly significantly RB 10_1101-2021_01_02_425006 162 13 increased increase VBD 10_1101-2021_01_02_425006 162 14 to to IN 10_1101-2021_01_02_425006 162 15 0.82 0.82 CD 10_1101-2021_01_02_425006 162 16 and and CC 10_1101-2021_01_02_425006 162 17 0.84 0.84 CD 10_1101-2021_01_02_425006 162 18 for for IN 10_1101-2021_01_02_425006 162 19 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 162 20 , , , 10_1101-2021_01_02_425006 162 21 respectively respectively RB 10_1101-2021_01_02_425006 162 22 , , , 10_1101-2021_01_02_425006 162 23 and and CC 10_1101-2021_01_02_425006 162 24 0.79 0.79 CD 10_1101-2021_01_02_425006 162 25 263 263 CD 10_1101-2021_01_02_425006 162 26 and and CC 10_1101-2021_01_02_425006 162 27 0.86 0.86 CD 10_1101-2021_01_02_425006 162 28 for for IN 10_1101-2021_01_02_425006 162 29 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 162 30 ( ( -LRB- 10_1101-2021_01_02_425006 162 31 fig fig NN 10_1101-2021_01_02_425006 162 32 . . . 10_1101-2021_01_02_425006 163 1 S4D S4D NNP 10_1101-2021_01_02_425006 163 2 ) ) -RRB- 10_1101-2021_01_02_425006 163 3 . . . 10_1101-2021_01_02_425006 164 1 The the DT 10_1101-2021_01_02_425006 164 2 significantly significantly RB 10_1101-2021_01_02_425006 164 3 improved improve VBN 10_1101-2021_01_02_425006 164 4 results result NNS 10_1101-2021_01_02_425006 164 5 verified verify VBD 10_1101-2021_01_02_425006 164 6 the the DT 10_1101-2021_01_02_425006 164 7 ability ability NN 10_1101-2021_01_02_425006 164 8 of of IN 10_1101-2021_01_02_425006 164 9 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 164 10 264 264 CD 10_1101-2021_01_02_425006 164 11 to to TO 10_1101-2021_01_02_425006 164 12 accurately accurately RB 10_1101-2021_01_02_425006 164 13 predict predict VB 10_1101-2021_01_02_425006 164 14 ATU ATU NNP 10_1101-2021_01_02_425006 164 15 when when WRB 10_1101-2021_01_02_425006 164 16 giving give VBG 10_1101-2021_01_02_425006 164 17 more more RBR 10_1101-2021_01_02_425006 164 18 accurate accurate JJ 10_1101-2021_01_02_425006 164 19 maximal maximal JJ 10_1101-2021_01_02_425006 164 20 ATU ATU NNP 10_1101-2021_01_02_425006 164 21 clusters cluster NNS 10_1101-2021_01_02_425006 164 22 . . . 10_1101-2021_01_02_425006 165 1 In in IN 10_1101-2021_01_02_425006 165 2 addition addition NN 10_1101-2021_01_02_425006 165 3 , , , 10_1101-2021_01_02_425006 165 4 we -PRON- PRP 10_1101-2021_01_02_425006 165 5 found find VBD 10_1101-2021_01_02_425006 165 6 that that IN 10_1101-2021_01_02_425006 165 7 265 265 CD 10_1101-2021_01_02_425006 165 8 the the DT 10_1101-2021_01_02_425006 165 9 number number NN 10_1101-2021_01_02_425006 165 10 of of IN 10_1101-2021_01_02_425006 165 11 predicted predict VBN 10_1101-2021_01_02_425006 165 12 ATUs atu NNS 10_1101-2021_01_02_425006 165 13 and and CC 10_1101-2021_01_02_425006 165 14 the the DT 10_1101-2021_01_02_425006 165 15 evaluated evaluate VBN 10_1101-2021_01_02_425006 165 16 ATUs ATUs NNPS 10_1101-2021_01_02_425006 165 17 under under IN 10_1101-2021_01_02_425006 165 18 the the DT 10_1101-2021_01_02_425006 165 19 maximal maximal JJ 10_1101-2021_01_02_425006 165 20 ATU ATU NNP 10_1101-2021_01_02_425006 165 21 cluster cluster NN 10_1101-2021_01_02_425006 165 22 with with IN 10_1101-2021_01_02_425006 165 23 the the DT 10_1101-2021_01_02_425006 165 24 same same JJ 10_1101-2021_01_02_425006 165 25 266 266 CD 10_1101-2021_01_02_425006 165 26 size size NN 10_1101-2021_01_02_425006 165 27 were be VBD 10_1101-2021_01_02_425006 165 28 similar similar JJ 10_1101-2021_01_02_425006 165 29 except except IN 10_1101-2021_01_02_425006 165 30 for for IN 10_1101-2021_01_02_425006 165 31 the the DT 10_1101-2021_01_02_425006 165 32 maximal maximal JJ 10_1101-2021_01_02_425006 165 33 size size NN 10_1101-2021_01_02_425006 165 34 ( ( -LRB- 10_1101-2021_01_02_425006 165 35 Fig fig NN 10_1101-2021_01_02_425006 165 36 . . . 10_1101-2021_01_02_425006 166 1 3C 3c NN 10_1101-2021_01_02_425006 166 2 ) ) -RRB- 10_1101-2021_01_02_425006 166 3 , , , 10_1101-2021_01_02_425006 166 4 and and CC 10_1101-2021_01_02_425006 166 5 they -PRON- PRP 10_1101-2021_01_02_425006 166 6 were be VBD 10_1101-2021_01_02_425006 166 7 far far RB 10_1101-2021_01_02_425006 166 8 less less JJR 10_1101-2021_01_02_425006 166 9 than than IN 10_1101-2021_01_02_425006 166 10 the the DT 10_1101-2021_01_02_425006 166 11 theoretical theoretical JJ 10_1101-2021_01_02_425006 166 12 267 267 CD 10_1101-2021_01_02_425006 166 13 number number NN 10_1101-2021_01_02_425006 166 14 , , , 10_1101-2021_01_02_425006 166 15 which which WDT 10_1101-2021_01_02_425006 166 16 indicated indicate VBD 10_1101-2021_01_02_425006 166 17 that that IN 10_1101-2021_01_02_425006 166 18 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 166 19 can can MD 10_1101-2021_01_02_425006 166 20 effectively effectively RB 10_1101-2021_01_02_425006 166 21 exclude exclude VB 10_1101-2021_01_02_425006 166 22 most most JJS 10_1101-2021_01_02_425006 166 23 of of IN 10_1101-2021_01_02_425006 166 24 the the DT 10_1101-2021_01_02_425006 166 25 incorrect incorrect JJ 10_1101-2021_01_02_425006 166 26 ATUs atu NNS 10_1101-2021_01_02_425006 166 27 . . . 10_1101-2021_01_02_425006 167 1 268 268 CD 10_1101-2021_01_02_425006 167 2 Please please UH 10_1101-2021_01_02_425006 167 3 place place VB 10_1101-2021_01_02_425006 167 4 Fig Fig NNP 10_1101-2021_01_02_425006 167 5 . . . 10_1101-2021_01_02_425006 168 1 3 3 CD 10_1101-2021_01_02_425006 168 2 here here RB 10_1101-2021_01_02_425006 168 3 . . . 10_1101-2021_01_02_425006 169 1 269 269 CD 10_1101-2021_01_02_425006 169 2 The the DT 10_1101-2021_01_02_425006 169 3 bias bias NN 10_1101-2021_01_02_425006 169 4 rate rate NN 10_1101-2021_01_02_425006 169 5 constraints constraint NNS 10_1101-2021_01_02_425006 169 6 efficiently efficiently RB 10_1101-2021_01_02_425006 169 7 improve improve VBP 10_1101-2021_01_02_425006 169 8 the the DT 10_1101-2021_01_02_425006 169 9 ability ability NN 10_1101-2021_01_02_425006 169 10 of of IN 10_1101-2021_01_02_425006 169 11 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 169 12 to to TO 10_1101-2021_01_02_425006 169 13 predict predict VB 10_1101-2021_01_02_425006 169 14 ATUs atu NNS 10_1101-2021_01_02_425006 169 15 270 270 CD 10_1101-2021_01_02_425006 169 16 We -PRON- PRP 10_1101-2021_01_02_425006 169 17 tried try VBD 10_1101-2021_01_02_425006 169 18 to to TO 10_1101-2021_01_02_425006 169 19 use use VB 10_1101-2021_01_02_425006 169 20 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 169 21 without without IN 10_1101-2021_01_02_425006 169 22 bias bias NN 10_1101-2021_01_02_425006 169 23 rate rate NN 10_1101-2021_01_02_425006 169 24 constraints constraint NNS 10_1101-2021_01_02_425006 169 25 to to TO 10_1101-2021_01_02_425006 169 26 predict predict VB 10_1101-2021_01_02_425006 169 27 the the DT 10_1101-2021_01_02_425006 169 28 ATUs atu NNS 10_1101-2021_01_02_425006 169 29 of of IN 10_1101-2021_01_02_425006 169 30 E. E. NNP 10_1101-2021_01_02_425006 169 31 coli coli NNS 10_1101-2021_01_02_425006 169 32 and and CC 10_1101-2021_01_02_425006 169 33 found find VBD 10_1101-2021_01_02_425006 169 34 that that IN 10_1101-2021_01_02_425006 169 35 its -PRON- PRP$ 10_1101-2021_01_02_425006 169 36 271 271 CD 10_1101-2021_01_02_425006 169 37 performance performance NN 10_1101-2021_01_02_425006 169 38 significantly significantly RB 10_1101-2021_01_02_425006 169 39 decreased decrease VBD 10_1101-2021_01_02_425006 169 40 compared compare VBN 10_1101-2021_01_02_425006 169 41 with with IN 10_1101-2021_01_02_425006 169 42 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 169 43 ( ( -LRB- 10_1101-2021_01_02_425006 169 44 Fig fig NN 10_1101-2021_01_02_425006 169 45 . . . 10_1101-2021_01_02_425006 170 1 4 4 LS 10_1101-2021_01_02_425006 170 2 and and CC 10_1101-2021_01_02_425006 170 3 fig fig NN 10_1101-2021_01_02_425006 170 4 . . . 10_1101-2021_01_02_425006 171 1 S5 S5 NNP 10_1101-2021_01_02_425006 171 2 ) ) -RRB- 10_1101-2021_01_02_425006 171 3 . . . 10_1101-2021_01_02_425006 172 1 Specifically specifically RB 10_1101-2021_01_02_425006 172 2 , , , 10_1101-2021_01_02_425006 172 3 the the DT 10_1101-2021_01_02_425006 172 4 F-272 F-272 NNP 10_1101-2021_01_02_425006 172 5 score score NN 10_1101-2021_01_02_425006 172 6 of of IN 10_1101-2021_01_02_425006 172 7 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 172 8 without without IN 10_1101-2021_01_02_425006 172 9 bias bias NN 10_1101-2021_01_02_425006 172 10 rate rate NN 10_1101-2021_01_02_425006 172 11 constraints constraint NNS 10_1101-2021_01_02_425006 172 12 was be VBD 10_1101-2021_01_02_425006 172 13 0.69/0.68 0.69/0.68 RB 10_1101-2021_01_02_425006 172 14 based base VBN 10_1101-2021_01_02_425006 172 15 on on IN 10_1101-2021_01_02_425006 172 16 perfect perfect JJ 10_1101-2021_01_02_425006 172 17 matching matching NN 10_1101-2021_01_02_425006 172 18 for for IN 10_1101-2021_01_02_425006 172 19 273 273 CD 10_1101-2021_01_02_425006 172 20 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 172 21 / / SYM 10_1101-2021_01_02_425006 172 22 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 172 23 , , , 10_1101-2021_01_02_425006 172 24 compared compare VBN 10_1101-2021_01_02_425006 172 25 with with IN 10_1101-2021_01_02_425006 172 26 0.75/0.74 0.75/0.74 NNP 10_1101-2021_01_02_425006 172 27 for for IN 10_1101-2021_01_02_425006 172 28 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 172 29 . . . 10_1101-2021_01_02_425006 173 1 When when WRB 10_1101-2021_01_02_425006 173 2 using use VBG 10_1101-2021_01_02_425006 173 3 relaxed relaxed JJ 10_1101-2021_01_02_425006 173 4 matching matching NN 10_1101-2021_01_02_425006 173 5 , , , 10_1101-2021_01_02_425006 173 6 the the DT 10_1101-2021_01_02_425006 173 7 274 274 CD 10_1101-2021_01_02_425006 173 8 F F NNP 10_1101-2021_01_02_425006 173 9 - - HYPH 10_1101-2021_01_02_425006 173 10 score score NN 10_1101-2021_01_02_425006 173 11 of of IN 10_1101-2021_01_02_425006 173 12 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 173 13 without without IN 10_1101-2021_01_02_425006 173 14 bias bias NN 10_1101-2021_01_02_425006 173 15 rate rate NN 10_1101-2021_01_02_425006 173 16 constraints constraint NNS 10_1101-2021_01_02_425006 173 17 was be VBD 10_1101-2021_01_02_425006 173 18 0.79/0.78 0.79/0.78 NN 10_1101-2021_01_02_425006 173 19 for for IN 10_1101-2021_01_02_425006 173 20 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 173 21 / / SYM 10_1101-2021_01_02_425006 173 22 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 173 23 , , , 10_1101-2021_01_02_425006 173 24 275 275 CD 10_1101-2021_01_02_425006 173 25 compared compare VBN 10_1101-2021_01_02_425006 173 26 with with IN 10_1101-2021_01_02_425006 173 27 0.83/0.83 0.83/0.83 NNP 10_1101-2021_01_02_425006 173 28 for for IN 10_1101-2021_01_02_425006 173 29 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 173 30 . . . 10_1101-2021_01_02_425006 174 1 This this DT 10_1101-2021_01_02_425006 174 2 result result NN 10_1101-2021_01_02_425006 174 3 suggested suggest VBD 10_1101-2021_01_02_425006 174 4 that that IN 10_1101-2021_01_02_425006 174 5 the the DT 10_1101-2021_01_02_425006 174 6 bias bias NN 10_1101-2021_01_02_425006 174 7 rate rate NN 10_1101-2021_01_02_425006 174 8 constraints constraint NNS 10_1101-2021_01_02_425006 174 9 of of IN 10_1101-2021_01_02_425006 174 10 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 174 11 276 276 CD 10_1101-2021_01_02_425006 174 12 could could MD 10_1101-2021_01_02_425006 174 13 capture capture VB 10_1101-2021_01_02_425006 174 14 useful useful JJ 10_1101-2021_01_02_425006 174 15 information information NN 10_1101-2021_01_02_425006 174 16 about about IN 10_1101-2021_01_02_425006 174 17 the the DT 10_1101-2021_01_02_425006 174 18 non non JJ 10_1101-2021_01_02_425006 174 19 - - JJ 10_1101-2021_01_02_425006 174 20 uniform uniform JJ 10_1101-2021_01_02_425006 174 21 distribution distribution NN 10_1101-2021_01_02_425006 174 22 of of IN 10_1101-2021_01_02_425006 174 23 the the DT 10_1101-2021_01_02_425006 174 24 RNA RNA NNP 10_1101-2021_01_02_425006 174 25 - - HYPH 10_1101-2021_01_02_425006 174 26 Seq Seq NNP 10_1101-2021_01_02_425006 174 27 reads read VBZ 10_1101-2021_01_02_425006 174 28 along along IN 10_1101-2021_01_02_425006 174 29 the the DT 10_1101-2021_01_02_425006 174 30 277 277 CD 10_1101-2021_01_02_425006 174 31 .CC .cc SYM 10_1101-2021_01_02_425006 174 32 - - HYPH 10_1101-2021_01_02_425006 174 33 BY by IN 10_1101-2021_01_02_425006 174 34 - - HYPH 10_1101-2021_01_02_425006 174 35 NC NC NNP 10_1101-2021_01_02_425006 174 36 - - HYPH 10_1101-2021_01_02_425006 174 37 ND ND NNP 10_1101-2021_01_02_425006 174 38 4.0 4.0 CD 10_1101-2021_01_02_425006 174 39 International International NNP 10_1101-2021_01_02_425006 174 40 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 174 41 under under IN 10_1101-2021_01_02_425006 174 42 a a DT 10_1101-2021_01_02_425006 174 43 ( ( -LRB- 10_1101-2021_01_02_425006 174 44 which which WDT 10_1101-2021_01_02_425006 174 45 was be VBD 10_1101-2021_01_02_425006 174 46 not not RB 10_1101-2021_01_02_425006 174 47 certified certify VBN 10_1101-2021_01_02_425006 174 48 by by IN 10_1101-2021_01_02_425006 174 49 peer peer NN 10_1101-2021_01_02_425006 174 50 review review NN 10_1101-2021_01_02_425006 174 51 ) ) -RRB- 10_1101-2021_01_02_425006 174 52 is be VBZ 10_1101-2021_01_02_425006 174 53 the the DT 10_1101-2021_01_02_425006 174 54 author author NN 10_1101-2021_01_02_425006 174 55 / / SYM 10_1101-2021_01_02_425006 174 56 funder funder NN 10_1101-2021_01_02_425006 174 57 , , , 10_1101-2021_01_02_425006 174 58 who who WP 10_1101-2021_01_02_425006 174 59 has have VBZ 10_1101-2021_01_02_425006 174 60 granted grant VBN 10_1101-2021_01_02_425006 174 61 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 174 62 a a DT 10_1101-2021_01_02_425006 174 63 license license NN 10_1101-2021_01_02_425006 174 64 to to TO 10_1101-2021_01_02_425006 174 65 display display VB 10_1101-2021_01_02_425006 174 66 the the DT 10_1101-2021_01_02_425006 174 67 preprint preprint NN 10_1101-2021_01_02_425006 174 68 in in IN 10_1101-2021_01_02_425006 174 69 perpetuity perpetuity NN 10_1101-2021_01_02_425006 174 70 . . . 10_1101-2021_01_02_425006 175 1 It -PRON- PRP 10_1101-2021_01_02_425006 175 2 is be VBZ 10_1101-2021_01_02_425006 175 3 made make VBN 10_1101-2021_01_02_425006 175 4 The the DT 10_1101-2021_01_02_425006 175 5 copyright copyright NN 10_1101-2021_01_02_425006 175 6 holder holder NN 10_1101-2021_01_02_425006 175 7 for for IN 10_1101-2021_01_02_425006 175 8 this this DT 10_1101-2021_01_02_425006 175 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 175 10 version version NN 10_1101-2021_01_02_425006 175 11 posted post VBD 10_1101-2021_01_02_425006 175 12 January January NNP 10_1101-2021_01_02_425006 175 13 6 6 CD 10_1101-2021_01_02_425006 175 14 , , , 10_1101-2021_01_02_425006 175 15 2021 2021 CD 10_1101-2021_01_02_425006 175 16 . . . 10_1101-2021_01_02_425006 175 17 ; ; : 10_1101-2021_01_02_425006 175 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 175 19 : : : 10_1101-2021_01_02_425006 175 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 175 21 preprint preprint NN 10_1101-2021_01_02_425006 175 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 175 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 175 24 17 17 CD 10_1101-2021_01_02_425006 175 25 mRNA mrna NN 10_1101-2021_01_02_425006 175 26 transcripts transcript NNS 10_1101-2021_01_02_425006 175 27 ( ( -LRB- 10_1101-2021_01_02_425006 175 28 32 32 CD 10_1101-2021_01_02_425006 175 29 - - SYM 10_1101-2021_01_02_425006 175 30 35 35 CD 10_1101-2021_01_02_425006 175 31 ) ) -RRB- 10_1101-2021_01_02_425006 175 32 and and CC 10_1101-2021_01_02_425006 175 33 then then RB 10_1101-2021_01_02_425006 175 34 efficiently efficiently RB 10_1101-2021_01_02_425006 175 35 improve improve VBP 10_1101-2021_01_02_425006 175 36 the the DT 10_1101-2021_01_02_425006 175 37 ability ability NN 10_1101-2021_01_02_425006 175 38 of of IN 10_1101-2021_01_02_425006 175 39 the the DT 10_1101-2021_01_02_425006 175 40 model model NN 10_1101-2021_01_02_425006 175 41 to to TO 10_1101-2021_01_02_425006 175 42 predict predict VB 10_1101-2021_01_02_425006 175 43 complex complex JJ 10_1101-2021_01_02_425006 175 44 278 278 CD 10_1101-2021_01_02_425006 175 45 ATUs atu NNS 10_1101-2021_01_02_425006 175 46 . . . 10_1101-2021_01_02_425006 176 1 279 279 CD 10_1101-2021_01_02_425006 176 2 Please please UH 10_1101-2021_01_02_425006 176 3 place place VB 10_1101-2021_01_02_425006 176 4 Fig Fig NNP 10_1101-2021_01_02_425006 176 5 . . . 10_1101-2021_01_02_425006 177 1 4 4 CD 10_1101-2021_01_02_425006 177 2 here here RB 10_1101-2021_01_02_425006 177 3 . . . 10_1101-2021_01_02_425006 178 1 280 280 CD 10_1101-2021_01_02_425006 178 2 ATUs atu NNS 10_1101-2021_01_02_425006 178 3 predicted predict VBN 10_1101-2021_01_02_425006 178 4 by by IN 10_1101-2021_01_02_425006 178 5 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 178 6 display display VB 10_1101-2021_01_02_425006 178 7 a a DT 10_1101-2021_01_02_425006 178 8 dynamic dynamic JJ 10_1101-2021_01_02_425006 178 9 composition composition NN 10_1101-2021_01_02_425006 178 10 and and CC 10_1101-2021_01_02_425006 178 11 overlapping overlap VBG 10_1101-2021_01_02_425006 178 12 nature nature NN 10_1101-2021_01_02_425006 178 13 281 281 CD 10_1101-2021_01_02_425006 178 14 A a DT 10_1101-2021_01_02_425006 178 15 total total NN 10_1101-2021_01_02_425006 178 16 of of IN 10_1101-2021_01_02_425006 178 17 2,973 2,973 CD 10_1101-2021_01_02_425006 178 18 distinct distinct JJ 10_1101-2021_01_02_425006 178 19 ATUs atu NNS 10_1101-2021_01_02_425006 178 20 were be VBD 10_1101-2021_01_02_425006 178 21 identified identify VBN 10_1101-2021_01_02_425006 178 22 in in IN 10_1101-2021_01_02_425006 178 23 M9 M9 NNP 10_1101-2021_01_02_425006 178 24 minimal minimal JJ 10_1101-2021_01_02_425006 178 25 medium medium NN 10_1101-2021_01_02_425006 178 26 , , , 10_1101-2021_01_02_425006 178 27 and and CC 10_1101-2021_01_02_425006 178 28 2,767 2,767 CD 10_1101-2021_01_02_425006 178 29 were be VBD 10_1101-2021_01_02_425006 178 30 identified identify VBN 10_1101-2021_01_02_425006 178 31 in in IN 10_1101-2021_01_02_425006 178 32 Rich Rich NNP 10_1101-2021_01_02_425006 178 33 282 282 CD 10_1101-2021_01_02_425006 178 34 medium medium NN 10_1101-2021_01_02_425006 178 35 . . . 10_1101-2021_01_02_425006 179 1 Among among IN 10_1101-2021_01_02_425006 179 2 them -PRON- PRP 10_1101-2021_01_02_425006 179 3 , , , 10_1101-2021_01_02_425006 179 4 there there EX 10_1101-2021_01_02_425006 179 5 were be VBD 10_1101-2021_01_02_425006 179 6 1,423/1,550 1,423/1,550 CD 10_1101-2021_01_02_425006 179 7 distinct distinct JJ 10_1101-2021_01_02_425006 179 8 ATUs atu NNS 10_1101-2021_01_02_425006 179 9 on on IN 10_1101-2021_01_02_425006 179 10 the the DT 10_1101-2021_01_02_425006 179 11 forward forward JJ 10_1101-2021_01_02_425006 179 12 strand strand NN 10_1101-2021_01_02_425006 179 13 and and CC 10_1101-2021_01_02_425006 179 14 1,323/1,444 1,323/1,444 NNP 10_1101-2021_01_02_425006 179 15 on on IN 10_1101-2021_01_02_425006 179 16 283 283 CD 10_1101-2021_01_02_425006 179 17 the the DT 10_1101-2021_01_02_425006 179 18 reverse reverse JJ 10_1101-2021_01_02_425006 179 19 strand strand NN 10_1101-2021_01_02_425006 179 20 for for IN 10_1101-2021_01_02_425006 179 21 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 179 22 / / SYM 10_1101-2021_01_02_425006 179 23 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 179 24 . . . 10_1101-2021_01_02_425006 180 1 Each each DT 10_1101-2021_01_02_425006 180 2 of of IN 10_1101-2021_01_02_425006 180 3 the the DT 10_1101-2021_01_02_425006 180 4 predicted predict VBN 10_1101-2021_01_02_425006 180 5 ATUs atu NNS 10_1101-2021_01_02_425006 180 6 was be VBD 10_1101-2021_01_02_425006 180 7 comprised comprise VBN 10_1101-2021_01_02_425006 180 8 of of IN 10_1101-2021_01_02_425006 180 9 an an DT 10_1101-2021_01_02_425006 180 10 284 284 CD 10_1101-2021_01_02_425006 180 11 average average NN 10_1101-2021_01_02_425006 180 12 of of IN 10_1101-2021_01_02_425006 180 13 2.59 2.59 CD 10_1101-2021_01_02_425006 180 14 genes gene NNS 10_1101-2021_01_02_425006 180 15 , , , 10_1101-2021_01_02_425006 180 16 with with IN 10_1101-2021_01_02_425006 180 17 the the DT 10_1101-2021_01_02_425006 180 18 largest large JJS 10_1101-2021_01_02_425006 180 19 ATU atu NN 10_1101-2021_01_02_425006 180 20 containing contain VBG 10_1101-2021_01_02_425006 180 21 28 28 CD 10_1101-2021_01_02_425006 180 22 genes gene NNS 10_1101-2021_01_02_425006 180 23 across across IN 10_1101-2021_01_02_425006 180 24 the the DT 10_1101-2021_01_02_425006 180 25 two two CD 10_1101-2021_01_02_425006 180 26 conditions condition NNS 10_1101-2021_01_02_425006 180 27 . . . 10_1101-2021_01_02_425006 181 1 The the DT 10_1101-2021_01_02_425006 181 2 285 285 CD 10_1101-2021_01_02_425006 181 3 distribution distribution NN 10_1101-2021_01_02_425006 181 4 of of IN 10_1101-2021_01_02_425006 181 5 the the DT 10_1101-2021_01_02_425006 181 6 size size NN 10_1101-2021_01_02_425006 181 7 of of IN 10_1101-2021_01_02_425006 181 8 the the DT 10_1101-2021_01_02_425006 181 9 predicted predict VBN 10_1101-2021_01_02_425006 181 10 ATUs atu NNS 10_1101-2021_01_02_425006 181 11 is be VBZ 10_1101-2021_01_02_425006 181 12 shown show VBN 10_1101-2021_01_02_425006 181 13 in in IN 10_1101-2021_01_02_425006 181 14 Fig Fig NNP 10_1101-2021_01_02_425006 181 15 . . . 10_1101-2021_01_02_425006 182 1 5A 5a RB 10_1101-2021_01_02_425006 182 2 , , , 10_1101-2021_01_02_425006 182 3 from from IN 10_1101-2021_01_02_425006 182 4 which which WDT 10_1101-2021_01_02_425006 182 5 we -PRON- PRP 10_1101-2021_01_02_425006 182 6 can can MD 10_1101-2021_01_02_425006 182 7 see see VB 10_1101-2021_01_02_425006 182 8 that that IN 10_1101-2021_01_02_425006 182 9 the the DT 10_1101-2021_01_02_425006 182 10 286 286 CD 10_1101-2021_01_02_425006 182 11 majority majority NN 10_1101-2021_01_02_425006 182 12 of of IN 10_1101-2021_01_02_425006 182 13 ATUs atu NNS 10_1101-2021_01_02_425006 182 14 ( ( -LRB- 10_1101-2021_01_02_425006 182 15 more more JJR 10_1101-2021_01_02_425006 182 16 than than IN 10_1101-2021_01_02_425006 182 17 87 87 CD 10_1101-2021_01_02_425006 182 18 % % NN 10_1101-2021_01_02_425006 182 19 ) ) -RRB- 10_1101-2021_01_02_425006 182 20 contained contain VBD 10_1101-2021_01_02_425006 182 21 fewer few JJR 10_1101-2021_01_02_425006 182 22 than than IN 10_1101-2021_01_02_425006 182 23 five five CD 10_1101-2021_01_02_425006 182 24 genes gene NNS 10_1101-2021_01_02_425006 182 25 in in IN 10_1101-2021_01_02_425006 182 26 M9 M9 NNP 10_1101-2021_01_02_425006 182 27 minimal minimal JJ 10_1101-2021_01_02_425006 182 28 medium medium NN 10_1101-2021_01_02_425006 182 29 and and CC 10_1101-2021_01_02_425006 182 30 Rich Rich NNP 10_1101-2021_01_02_425006 182 31 287 287 CD 10_1101-2021_01_02_425006 182 32 medium medium NN 10_1101-2021_01_02_425006 182 33 . . . 10_1101-2021_01_02_425006 183 1 Approximately approximately RB 10_1101-2021_01_02_425006 183 2 41 41 CD 10_1101-2021_01_02_425006 183 3 % % NN 10_1101-2021_01_02_425006 183 4 of of IN 10_1101-2021_01_02_425006 183 5 the the DT 10_1101-2021_01_02_425006 183 6 genes gene NNS 10_1101-2021_01_02_425006 183 7 in in IN 10_1101-2021_01_02_425006 183 8 E. E. NNP 10_1101-2021_01_02_425006 183 9 coli coli NNS 10_1101-2021_01_02_425006 183 10 were be VBD 10_1101-2021_01_02_425006 183 11 contained contain VBN 10_1101-2021_01_02_425006 183 12 in in IN 10_1101-2021_01_02_425006 183 13 more more JJR 10_1101-2021_01_02_425006 183 14 than than IN 10_1101-2021_01_02_425006 183 15 one one CD 10_1101-2021_01_02_425006 183 16 ATU ATU NNP 10_1101-2021_01_02_425006 183 17 for for IN 10_1101-2021_01_02_425006 183 18 288 288 CD 10_1101-2021_01_02_425006 183 19 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 183 20 , , , 10_1101-2021_01_02_425006 183 21 compared compare VBN 10_1101-2021_01_02_425006 183 22 to to IN 10_1101-2021_01_02_425006 183 23 43 43 CD 10_1101-2021_01_02_425006 183 24 % % NN 10_1101-2021_01_02_425006 183 25 genes gene NNS 10_1101-2021_01_02_425006 183 26 for for IN 10_1101-2021_01_02_425006 183 27 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 183 28 , , , 10_1101-2021_01_02_425006 183 29 suggesting suggest VBG 10_1101-2021_01_02_425006 183 30 that that IN 10_1101-2021_01_02_425006 183 31 the the DT 10_1101-2021_01_02_425006 183 32 ATUs atu NNS 10_1101-2021_01_02_425006 183 33 in in IN 10_1101-2021_01_02_425006 183 34 a a DT 10_1101-2021_01_02_425006 183 35 maximal maximal JJ 10_1101-2021_01_02_425006 183 36 ATU ATU NNP 10_1101-2021_01_02_425006 183 37 289 289 CD 10_1101-2021_01_02_425006 183 38 cluster cluster NN 10_1101-2021_01_02_425006 183 39 generally generally RB 10_1101-2021_01_02_425006 183 40 overlapped overlap VBD 10_1101-2021_01_02_425006 183 41 with with IN 10_1101-2021_01_02_425006 183 42 each each DT 10_1101-2021_01_02_425006 183 43 other other JJ 10_1101-2021_01_02_425006 183 44 ( ( -LRB- 10_1101-2021_01_02_425006 183 45 Fig fig NN 10_1101-2021_01_02_425006 183 46 . . . 10_1101-2021_01_02_425006 184 1 5B 5b JJ 10_1101-2021_01_02_425006 184 2 ) ) -RRB- 10_1101-2021_01_02_425006 184 3 . . . 10_1101-2021_01_02_425006 185 1 In in IN 10_1101-2021_01_02_425006 185 2 addition addition NN 10_1101-2021_01_02_425006 185 3 , , , 10_1101-2021_01_02_425006 185 4 there there EX 10_1101-2021_01_02_425006 185 5 were be VBD 10_1101-2021_01_02_425006 185 6 1,576 1,576 CD 10_1101-2021_01_02_425006 185 7 ATU ATU NNP 10_1101-2021_01_02_425006 185 8 maximal maximal JJ 10_1101-2021_01_02_425006 185 9 290 290 CD 10_1101-2021_01_02_425006 185 10 clusters cluster NNS 10_1101-2021_01_02_425006 185 11 for for IN 10_1101-2021_01_02_425006 185 12 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 185 13 and and CC 10_1101-2021_01_02_425006 185 14 1,512 1,512 CD 10_1101-2021_01_02_425006 185 15 ATU ATU NNP 10_1101-2021_01_02_425006 185 16 maximal maximal JJ 10_1101-2021_01_02_425006 185 17 clusters cluster NNS 10_1101-2021_01_02_425006 185 18 for for IN 10_1101-2021_01_02_425006 185 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 185 20 . . . 10_1101-2021_01_02_425006 186 1 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 186 2 identified identify VBD 10_1101-2021_01_02_425006 186 3 a a DT 10_1101-2021_01_02_425006 186 4 291 291 CD 10_1101-2021_01_02_425006 186 5 total total NN 10_1101-2021_01_02_425006 186 6 of of IN 10_1101-2021_01_02_425006 186 7 1,977 1,977 CD 10_1101-2021_01_02_425006 186 8 identical identical JJ 10_1101-2021_01_02_425006 186 9 ATUs atu NNS 10_1101-2021_01_02_425006 186 10 under under IN 10_1101-2021_01_02_425006 186 11 the the DT 10_1101-2021_01_02_425006 186 12 two two CD 10_1101-2021_01_02_425006 186 13 conditions condition NNS 10_1101-2021_01_02_425006 186 14 , , , 10_1101-2021_01_02_425006 186 15 whereas whereas IN 10_1101-2021_01_02_425006 186 16 there there EX 10_1101-2021_01_02_425006 186 17 were be VBD 10_1101-2021_01_02_425006 186 18 1,786 1,786 CD 10_1101-2021_01_02_425006 186 19 distinct distinct JJ 10_1101-2021_01_02_425006 186 20 ATUs atu NNS 10_1101-2021_01_02_425006 186 21 . . . 10_1101-2021_01_02_425006 187 1 Among among IN 10_1101-2021_01_02_425006 187 2 292 292 CD 10_1101-2021_01_02_425006 187 3 the the DT 10_1101-2021_01_02_425006 187 4 distinct distinct JJ 10_1101-2021_01_02_425006 187 5 ATUs atu NNS 10_1101-2021_01_02_425006 187 6 across across IN 10_1101-2021_01_02_425006 187 7 the the DT 10_1101-2021_01_02_425006 187 8 two two CD 10_1101-2021_01_02_425006 187 9 conditions condition NNS 10_1101-2021_01_02_425006 187 10 , , , 10_1101-2021_01_02_425006 187 11 394 394 CD 10_1101-2021_01_02_425006 187 12 ATUs atu NNS 10_1101-2021_01_02_425006 187 13 were be VBD 10_1101-2021_01_02_425006 187 14 from from IN 10_1101-2021_01_02_425006 187 15 the the DT 10_1101-2021_01_02_425006 187 16 same same JJ 10_1101-2021_01_02_425006 187 17 maximal maximal JJ 10_1101-2021_01_02_425006 187 18 ATU ATU NNP 10_1101-2021_01_02_425006 187 19 clusters cluster NNS 10_1101-2021_01_02_425006 187 20 in in IN 10_1101-2021_01_02_425006 187 21 the the DT 10_1101-2021_01_02_425006 187 22 293 293 CD 10_1101-2021_01_02_425006 187 23 two two CD 10_1101-2021_01_02_425006 187 24 maximal maximal JJ 10_1101-2021_01_02_425006 187 25 ATU ATU NNP 10_1101-2021_01_02_425006 187 26 cluster cluster NN 10_1101-2021_01_02_425006 187 27 datasets dataset NNS 10_1101-2021_01_02_425006 187 28 , , , 10_1101-2021_01_02_425006 187 29 and and CC 10_1101-2021_01_02_425006 187 30 the the DT 10_1101-2021_01_02_425006 187 31 rest rest NN 10_1101-2021_01_02_425006 187 32 were be VBD 10_1101-2021_01_02_425006 187 33 from from IN 10_1101-2021_01_02_425006 187 34 different different JJ 10_1101-2021_01_02_425006 187 35 maximal maximal JJ 10_1101-2021_01_02_425006 187 36 ATU ATU NNP 10_1101-2021_01_02_425006 187 37 clusters cluster NNS 10_1101-2021_01_02_425006 187 38 . . . 10_1101-2021_01_02_425006 188 1 The the DT 10_1101-2021_01_02_425006 188 2 fact fact NN 10_1101-2021_01_02_425006 188 3 294 294 CD 10_1101-2021_01_02_425006 188 4 there there EX 10_1101-2021_01_02_425006 188 5 were be VBD 10_1101-2021_01_02_425006 188 6 distinct distinct JJ 10_1101-2021_01_02_425006 188 7 ATUs atu NNS 10_1101-2021_01_02_425006 188 8 under under IN 10_1101-2021_01_02_425006 188 9 the the DT 10_1101-2021_01_02_425006 188 10 two two CD 10_1101-2021_01_02_425006 188 11 conditions condition NNS 10_1101-2021_01_02_425006 188 12 suggests suggest VBZ 10_1101-2021_01_02_425006 188 13 that that IN 10_1101-2021_01_02_425006 188 14 ATUs atu NNS 10_1101-2021_01_02_425006 188 15 are be VBP 10_1101-2021_01_02_425006 188 16 dynamically dynamically RB 10_1101-2021_01_02_425006 188 17 responsive responsive JJ 10_1101-2021_01_02_425006 188 18 to to IN 10_1101-2021_01_02_425006 188 19 295 295 CD 10_1101-2021_01_02_425006 188 20 .CC .CC , 10_1101-2021_01_02_425006 188 21 - - HYPH 10_1101-2021_01_02_425006 188 22 BY by IN 10_1101-2021_01_02_425006 188 23 - - HYPH 10_1101-2021_01_02_425006 188 24 NC NC NNP 10_1101-2021_01_02_425006 188 25 - - HYPH 10_1101-2021_01_02_425006 188 26 ND ND NNP 10_1101-2021_01_02_425006 188 27 4.0 4.0 CD 10_1101-2021_01_02_425006 188 28 International International NNP 10_1101-2021_01_02_425006 188 29 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 188 30 under under IN 10_1101-2021_01_02_425006 188 31 a a DT 10_1101-2021_01_02_425006 188 32 ( ( -LRB- 10_1101-2021_01_02_425006 188 33 which which WDT 10_1101-2021_01_02_425006 188 34 was be VBD 10_1101-2021_01_02_425006 188 35 not not RB 10_1101-2021_01_02_425006 188 36 certified certify VBN 10_1101-2021_01_02_425006 188 37 by by IN 10_1101-2021_01_02_425006 188 38 peer peer NN 10_1101-2021_01_02_425006 188 39 review review NN 10_1101-2021_01_02_425006 188 40 ) ) -RRB- 10_1101-2021_01_02_425006 188 41 is be VBZ 10_1101-2021_01_02_425006 188 42 the the DT 10_1101-2021_01_02_425006 188 43 author author NN 10_1101-2021_01_02_425006 188 44 / / SYM 10_1101-2021_01_02_425006 188 45 funder funder NN 10_1101-2021_01_02_425006 188 46 , , , 10_1101-2021_01_02_425006 188 47 who who WP 10_1101-2021_01_02_425006 188 48 has have VBZ 10_1101-2021_01_02_425006 188 49 granted grant VBN 10_1101-2021_01_02_425006 188 50 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 188 51 a a DT 10_1101-2021_01_02_425006 188 52 license license NN 10_1101-2021_01_02_425006 188 53 to to TO 10_1101-2021_01_02_425006 188 54 display display VB 10_1101-2021_01_02_425006 188 55 the the DT 10_1101-2021_01_02_425006 188 56 preprint preprint NN 10_1101-2021_01_02_425006 188 57 in in IN 10_1101-2021_01_02_425006 188 58 perpetuity perpetuity NN 10_1101-2021_01_02_425006 188 59 . . . 10_1101-2021_01_02_425006 189 1 It -PRON- PRP 10_1101-2021_01_02_425006 189 2 is be VBZ 10_1101-2021_01_02_425006 189 3 made make VBN 10_1101-2021_01_02_425006 189 4 The the DT 10_1101-2021_01_02_425006 189 5 copyright copyright NN 10_1101-2021_01_02_425006 189 6 holder holder NN 10_1101-2021_01_02_425006 189 7 for for IN 10_1101-2021_01_02_425006 189 8 this this DT 10_1101-2021_01_02_425006 189 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 189 10 version version NN 10_1101-2021_01_02_425006 189 11 posted post VBD 10_1101-2021_01_02_425006 189 12 January January NNP 10_1101-2021_01_02_425006 189 13 6 6 CD 10_1101-2021_01_02_425006 189 14 , , , 10_1101-2021_01_02_425006 189 15 2021 2021 CD 10_1101-2021_01_02_425006 189 16 . . . 10_1101-2021_01_02_425006 189 17 ; ; : 10_1101-2021_01_02_425006 189 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 189 19 : : : 10_1101-2021_01_02_425006 189 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 189 21 preprint preprint NN 10_1101-2021_01_02_425006 189 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 189 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 189 24 18 18 CD 10_1101-2021_01_02_425006 189 25 different different JJ 10_1101-2021_01_02_425006 189 26 conditions condition NNS 10_1101-2021_01_02_425006 189 27 or or CC 10_1101-2021_01_02_425006 189 28 environmental environmental JJ 10_1101-2021_01_02_425006 189 29 stimuli stimulus NNS 10_1101-2021_01_02_425006 189 30 ( ( -LRB- 10_1101-2021_01_02_425006 189 31 for for IN 10_1101-2021_01_02_425006 189 32 more more JJR 10_1101-2021_01_02_425006 189 33 real real JJ 10_1101-2021_01_02_425006 189 34 examples example NNS 10_1101-2021_01_02_425006 189 35 about about IN 10_1101-2021_01_02_425006 189 36 the the DT 10_1101-2021_01_02_425006 189 37 ATUs atu NNS 10_1101-2021_01_02_425006 189 38 under under IN 10_1101-2021_01_02_425006 189 39 different different JJ 10_1101-2021_01_02_425006 189 40 296 296 CD 10_1101-2021_01_02_425006 189 41 conditions condition NNS 10_1101-2021_01_02_425006 189 42 , , , 10_1101-2021_01_02_425006 189 43 see see VB 10_1101-2021_01_02_425006 189 44 fig fig NN 10_1101-2021_01_02_425006 189 45 . . . 10_1101-2021_01_02_425006 190 1 S6 S6 NNP 10_1101-2021_01_02_425006 190 2 ) ) -RRB- 10_1101-2021_01_02_425006 190 3 . . . 10_1101-2021_01_02_425006 191 1 297 297 CD 10_1101-2021_01_02_425006 191 2 The the DT 10_1101-2021_01_02_425006 191 3 dynamic dynamic JJ 10_1101-2021_01_02_425006 191 4 composition composition NN 10_1101-2021_01_02_425006 191 5 of of IN 10_1101-2021_01_02_425006 191 6 predicted predict VBN 10_1101-2021_01_02_425006 191 7 ATUs atu NNS 10_1101-2021_01_02_425006 191 8 by by IN 10_1101-2021_01_02_425006 191 9 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 191 10 is be VBZ 10_1101-2021_01_02_425006 191 11 of of IN 10_1101-2021_01_02_425006 191 12 great great JJ 10_1101-2021_01_02_425006 191 13 significance significance NN 10_1101-2021_01_02_425006 191 14 to to TO 10_1101-2021_01_02_425006 191 15 understand understand VB 10_1101-2021_01_02_425006 191 16 the the DT 10_1101-2021_01_02_425006 191 17 298 298 CD 10_1101-2021_01_02_425006 191 18 interactions interaction NNS 10_1101-2021_01_02_425006 191 19 inside inside IN 10_1101-2021_01_02_425006 191 20 polymicrobial polymicrobial JJ 10_1101-2021_01_02_425006 191 21 communities community NNS 10_1101-2021_01_02_425006 191 22 . . . 10_1101-2021_01_02_425006 192 1 For for IN 10_1101-2021_01_02_425006 192 2 example example NN 10_1101-2021_01_02_425006 192 3 , , , 10_1101-2021_01_02_425006 192 4 chronic chronic JJ 10_1101-2021_01_02_425006 192 5 airway airway NN 10_1101-2021_01_02_425006 192 6 infection infection NN 10_1101-2021_01_02_425006 192 7 by by IN 10_1101-2021_01_02_425006 192 8 Pseudomonas Pseudomonas NNP 10_1101-2021_01_02_425006 192 9 299 299 CD 10_1101-2021_01_02_425006 192 10 aeruginosa aeruginosa NNP 10_1101-2021_01_02_425006 192 11 considerably considerably RB 10_1101-2021_01_02_425006 192 12 contributes contribute VBZ 10_1101-2021_01_02_425006 192 13 to to IN 10_1101-2021_01_02_425006 192 14 lung lung NN 10_1101-2021_01_02_425006 192 15 tissue tissue NN 10_1101-2021_01_02_425006 192 16 destruction destruction NN 10_1101-2021_01_02_425006 192 17 and and CC 10_1101-2021_01_02_425006 192 18 impairment impairment NN 10_1101-2021_01_02_425006 192 19 of of IN 10_1101-2021_01_02_425006 192 20 pulmonary pulmonary JJ 10_1101-2021_01_02_425006 192 21 function function NN 10_1101-2021_01_02_425006 192 22 in in IN 10_1101-2021_01_02_425006 192 23 300 300 CD 10_1101-2021_01_02_425006 192 24 cystic cystic JJ 10_1101-2021_01_02_425006 192 25 - - HYPH 10_1101-2021_01_02_425006 192 26 fibrosis fibrosis NN 10_1101-2021_01_02_425006 192 27 ( ( -LRB- 10_1101-2021_01_02_425006 192 28 CF cf NN 10_1101-2021_01_02_425006 192 29 ) ) -RRB- 10_1101-2021_01_02_425006 192 30 patients patient NNS 10_1101-2021_01_02_425006 192 31 ( ( -LRB- 10_1101-2021_01_02_425006 192 32 39 39 CD 10_1101-2021_01_02_425006 192 33 ) ) -RRB- 10_1101-2021_01_02_425006 192 34 . . . 10_1101-2021_01_02_425006 193 1 Marie Marie NNP 10_1101-2021_01_02_425006 193 2 et et NNP 10_1101-2021_01_02_425006 193 3 al al NNP 10_1101-2021_01_02_425006 193 4 . . . 10_1101-2021_01_02_425006 194 1 found find VBD 10_1101-2021_01_02_425006 194 2 that that IN 10_1101-2021_01_02_425006 194 3 the the DT 10_1101-2021_01_02_425006 194 4 presence presence NN 10_1101-2021_01_02_425006 194 5 of of IN 10_1101-2021_01_02_425006 194 6 E. E. NNP 10_1101-2021_01_02_425006 194 7 coli coli NNS 10_1101-2021_01_02_425006 194 8 complemented complement VBD 10_1101-2021_01_02_425006 194 9 the the DT 10_1101-2021_01_02_425006 194 10 301 301 CD 10_1101-2021_01_02_425006 194 11 growth growth NN 10_1101-2021_01_02_425006 194 12 defect defect NN 10_1101-2021_01_02_425006 194 13 of of IN 10_1101-2021_01_02_425006 194 14 a a DT 10_1101-2021_01_02_425006 194 15 P. P. NNP 10_1101-2021_01_02_425006 194 16 aeruginosa aeruginosa NNP 10_1101-2021_01_02_425006 194 17 bioA bioa RB 10_1101-2021_01_02_425006 194 18 - - HYPH 10_1101-2021_01_02_425006 194 19 disrupted disrupt VBN 10_1101-2021_01_02_425006 194 20 mutant mutant NN 10_1101-2021_01_02_425006 194 21 that that WDT 10_1101-2021_01_02_425006 194 22 is be VBZ 10_1101-2021_01_02_425006 194 23 unable unable JJ 10_1101-2021_01_02_425006 194 24 to to TO 10_1101-2021_01_02_425006 194 25 grow grow VB 10_1101-2021_01_02_425006 194 26 on on IN 10_1101-2021_01_02_425006 194 27 rich rich JJ 10_1101-2021_01_02_425006 194 28 medium medium NN 10_1101-2021_01_02_425006 194 29 , , , 10_1101-2021_01_02_425006 194 30 and and CC 10_1101-2021_01_02_425006 194 31 can can MD 10_1101-2021_01_02_425006 194 32 302 302 CD 10_1101-2021_01_02_425006 194 33 be be VB 10_1101-2021_01_02_425006 194 34 beneficial beneficial JJ 10_1101-2021_01_02_425006 194 35 to to IN 10_1101-2021_01_02_425006 194 36 P. P. NNP 10_1101-2021_01_02_425006 194 37 aeruginosa aeruginosa NNP 10_1101-2021_01_02_425006 194 38 when when WRB 10_1101-2021_01_02_425006 194 39 biotin biotin JJ 10_1101-2021_01_02_425006 194 40 supply supply NN 10_1101-2021_01_02_425006 194 41 is be VBZ 10_1101-2021_01_02_425006 194 42 limited limit VBN 10_1101-2021_01_02_425006 194 43 ( ( -LRB- 10_1101-2021_01_02_425006 194 44 39 39 CD 10_1101-2021_01_02_425006 194 45 ) ) -RRB- 10_1101-2021_01_02_425006 194 46 . . . 10_1101-2021_01_02_425006 195 1 An an DT 10_1101-2021_01_02_425006 195 2 ATU ATU NNP 10_1101-2021_01_02_425006 195 3 with with IN 10_1101-2021_01_02_425006 195 4 a a DT 10_1101-2021_01_02_425006 195 5 high high JJ 10_1101-2021_01_02_425006 195 6 expression expression NN 10_1101-2021_01_02_425006 195 7 level level NN 10_1101-2021_01_02_425006 195 8 303 303 CD 10_1101-2021_01_02_425006 195 9 coded code VBN 10_1101-2021_01_02_425006 195 10 by by IN 10_1101-2021_01_02_425006 195 11 the the DT 10_1101-2021_01_02_425006 195 12 uvrB uvrb CD 10_1101-2021_01_02_425006 195 13 gene gene NN 10_1101-2021_01_02_425006 195 14 is be VBZ 10_1101-2021_01_02_425006 195 15 identified identify VBN 10_1101-2021_01_02_425006 195 16 by by IN 10_1101-2021_01_02_425006 195 17 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 195 18 in in IN 10_1101-2021_01_02_425006 195 19 Rich Rich NNP 10_1101-2021_01_02_425006 195 20 medium medium NN 10_1101-2021_01_02_425006 195 21 , , , 10_1101-2021_01_02_425006 195 22 while while IN 10_1101-2021_01_02_425006 195 23 it -PRON- PRP 10_1101-2021_01_02_425006 195 24 does do VBZ 10_1101-2021_01_02_425006 195 25 not not RB 10_1101-2021_01_02_425006 195 26 exist exist VB 10_1101-2021_01_02_425006 195 27 in in IN 10_1101-2021_01_02_425006 195 28 M9 M9 NNP 10_1101-2021_01_02_425006 195 29 minimal minimal JJ 10_1101-2021_01_02_425006 195 30 304 304 CD 10_1101-2021_01_02_425006 195 31 medium medium NN 10_1101-2021_01_02_425006 195 32 ( ( -LRB- 10_1101-2021_01_02_425006 195 33 Fig fig NN 10_1101-2021_01_02_425006 195 34 . . . 10_1101-2021_01_02_425006 196 1 6 6 LS 10_1101-2021_01_02_425006 196 2 ) ) -RRB- 10_1101-2021_01_02_425006 196 3 . . . 10_1101-2021_01_02_425006 197 1 We -PRON- PRP 10_1101-2021_01_02_425006 197 2 predicted predict VBD 10_1101-2021_01_02_425006 197 3 the the DT 10_1101-2021_01_02_425006 197 4 uvrB uvrb JJ 10_1101-2021_01_02_425006 197 5 gene gene NN 10_1101-2021_01_02_425006 197 6 to to TO 10_1101-2021_01_02_425006 197 7 be be VB 10_1101-2021_01_02_425006 197 8 involved involve VBN 10_1101-2021_01_02_425006 197 9 in in IN 10_1101-2021_01_02_425006 197 10 the the DT 10_1101-2021_01_02_425006 197 11 biotin biotin JJ 10_1101-2021_01_02_425006 197 12 metabolism metabolism NN 10_1101-2021_01_02_425006 197 13 pathway pathway NN 10_1101-2021_01_02_425006 197 14 , , , 10_1101-2021_01_02_425006 197 15 as as IN 10_1101-2021_01_02_425006 197 16 the the DT 10_1101-2021_01_02_425006 197 17 305 305 CD 10_1101-2021_01_02_425006 197 18 bioB bioB NNS 10_1101-2021_01_02_425006 197 19 , , , 10_1101-2021_01_02_425006 197 20 bioF biof NN 10_1101-2021_01_02_425006 197 21 , , , 10_1101-2021_01_02_425006 197 22 bioC bioc ADD 10_1101-2021_01_02_425006 197 23 , , , 10_1101-2021_01_02_425006 197 24 and and CC 10_1101-2021_01_02_425006 197 25 bioD biod NN 10_1101-2021_01_02_425006 197 26 genes gene NNS 10_1101-2021_01_02_425006 197 27 contained contain VBN 10_1101-2021_01_02_425006 197 28 in in IN 10_1101-2021_01_02_425006 197 29 a a DT 10_1101-2021_01_02_425006 197 30 same same JJ 10_1101-2021_01_02_425006 197 31 ATU ATU NNP 10_1101-2021_01_02_425006 197 32 with with IN 10_1101-2021_01_02_425006 197 33 it -PRON- PRP 10_1101-2021_01_02_425006 197 34 have have VBP 10_1101-2021_01_02_425006 197 35 been be VBN 10_1101-2021_01_02_425006 197 36 known know VBN 10_1101-2021_01_02_425006 197 37 in in IN 10_1101-2021_01_02_425006 197 38 the the DT 10_1101-2021_01_02_425006 197 39 biotin biotin JJ 10_1101-2021_01_02_425006 197 40 306 306 CD 10_1101-2021_01_02_425006 197 41 metabolism metabolism NN 10_1101-2021_01_02_425006 197 42 KEGG KEGG NNP 10_1101-2021_01_02_425006 197 43 pathway pathway NN 10_1101-2021_01_02_425006 197 44 . . . 10_1101-2021_01_02_425006 198 1 Therefore therefore RB 10_1101-2021_01_02_425006 198 2 , , , 10_1101-2021_01_02_425006 198 3 the the DT 10_1101-2021_01_02_425006 198 4 observation observation NN 10_1101-2021_01_02_425006 198 5 by by IN 10_1101-2021_01_02_425006 198 6 Marie Marie NNP 10_1101-2021_01_02_425006 198 7 et et NNP 10_1101-2021_01_02_425006 198 8 al al NNP 10_1101-2021_01_02_425006 198 9 . . . 10_1101-2021_01_02_425006 199 1 can can MD 10_1101-2021_01_02_425006 199 2 be be VB 10_1101-2021_01_02_425006 199 3 explained explain VBN 10_1101-2021_01_02_425006 199 4 that that IN 10_1101-2021_01_02_425006 199 5 the the DT 10_1101-2021_01_02_425006 199 6 ATUs atu NNS 10_1101-2021_01_02_425006 199 7 307 307 CD 10_1101-2021_01_02_425006 199 8 coded code VBN 10_1101-2021_01_02_425006 199 9 by by IN 10_1101-2021_01_02_425006 199 10 the the DT 10_1101-2021_01_02_425006 199 11 uvrB uvrb JJ 10_1101-2021_01_02_425006 199 12 gene gene NN 10_1101-2021_01_02_425006 199 13 of of IN 10_1101-2021_01_02_425006 199 14 E. E. NNP 10_1101-2021_01_02_425006 199 15 coli coli NNS 10_1101-2021_01_02_425006 199 16 can can MD 10_1101-2021_01_02_425006 199 17 provide provide VB 10_1101-2021_01_02_425006 199 18 the the DT 10_1101-2021_01_02_425006 199 19 biotin biotin JJ 10_1101-2021_01_02_425006 199 20 supply supply NN 10_1101-2021_01_02_425006 199 21 for for IN 10_1101-2021_01_02_425006 199 22 P. P. NNP 10_1101-2021_01_02_425006 199 23 aeruginosa aeruginosa NNP 10_1101-2021_01_02_425006 199 24 under under IN 10_1101-2021_01_02_425006 199 25 rich rich JJ 10_1101-2021_01_02_425006 199 26 medium medium NN 10_1101-2021_01_02_425006 199 27 . . . 10_1101-2021_01_02_425006 200 1 308 308 LS 10_1101-2021_01_02_425006 200 2 This this DT 10_1101-2021_01_02_425006 200 3 result result NN 10_1101-2021_01_02_425006 200 4 showed show VBD 10_1101-2021_01_02_425006 200 5 that that IN 10_1101-2021_01_02_425006 200 6 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 200 7 could could MD 10_1101-2021_01_02_425006 200 8 increase increase VB 10_1101-2021_01_02_425006 200 9 our -PRON- PRP$ 10_1101-2021_01_02_425006 200 10 understanding understanding NN 10_1101-2021_01_02_425006 200 11 of of IN 10_1101-2021_01_02_425006 200 12 interspecies interspecie NNS 10_1101-2021_01_02_425006 200 13 competition competition NN 10_1101-2021_01_02_425006 200 14 and and CC 10_1101-2021_01_02_425006 200 15 309 309 CD 10_1101-2021_01_02_425006 200 16 cooperation cooperation NN 10_1101-2021_01_02_425006 200 17 , , , 10_1101-2021_01_02_425006 200 18 which which WDT 10_1101-2021_01_02_425006 200 19 play play VBP 10_1101-2021_01_02_425006 200 20 an an DT 10_1101-2021_01_02_425006 200 21 important important JJ 10_1101-2021_01_02_425006 200 22 role role NN 10_1101-2021_01_02_425006 200 23 in in IN 10_1101-2021_01_02_425006 200 24 shaping shape VBG 10_1101-2021_01_02_425006 200 25 the the DT 10_1101-2021_01_02_425006 200 26 composition composition NN 10_1101-2021_01_02_425006 200 27 and and CC 10_1101-2021_01_02_425006 200 28 structure structure NN 10_1101-2021_01_02_425006 200 29 of of IN 10_1101-2021_01_02_425006 200 30 polymicrobial polymicrobial JJ 10_1101-2021_01_02_425006 200 31 310 310 CD 10_1101-2021_01_02_425006 200 32 bacterial bacterial JJ 10_1101-2021_01_02_425006 200 33 populations population NNS 10_1101-2021_01_02_425006 200 34 . . . 10_1101-2021_01_02_425006 201 1 311 311 CD 10_1101-2021_01_02_425006 201 2 Please please UH 10_1101-2021_01_02_425006 201 3 place place NN 10_1101-2021_01_02_425006 201 4 Fig fig NN 10_1101-2021_01_02_425006 201 5 . . . 10_1101-2021_01_02_425006 202 1 5 5 CD 10_1101-2021_01_02_425006 202 2 here here RB 10_1101-2021_01_02_425006 202 3 . . . 10_1101-2021_01_02_425006 203 1 312 312 CD 10_1101-2021_01_02_425006 203 2 Please please UH 10_1101-2021_01_02_425006 203 3 place place VB 10_1101-2021_01_02_425006 203 4 Fig Fig NNP 10_1101-2021_01_02_425006 203 5 . . . 10_1101-2021_01_02_425006 204 1 6 6 CD 10_1101-2021_01_02_425006 204 2 here here RB 10_1101-2021_01_02_425006 204 3 . . . 10_1101-2021_01_02_425006 205 1 313 313 CD 10_1101-2021_01_02_425006 205 2 .CC .CC : 10_1101-2021_01_02_425006 205 3 - - HYPH 10_1101-2021_01_02_425006 205 4 BY by IN 10_1101-2021_01_02_425006 205 5 - - HYPH 10_1101-2021_01_02_425006 205 6 NC NC NNP 10_1101-2021_01_02_425006 205 7 - - HYPH 10_1101-2021_01_02_425006 205 8 ND ND NNP 10_1101-2021_01_02_425006 205 9 4.0 4.0 CD 10_1101-2021_01_02_425006 205 10 International International NNP 10_1101-2021_01_02_425006 205 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 205 12 under under IN 10_1101-2021_01_02_425006 205 13 a a DT 10_1101-2021_01_02_425006 205 14 ( ( -LRB- 10_1101-2021_01_02_425006 205 15 which which WDT 10_1101-2021_01_02_425006 205 16 was be VBD 10_1101-2021_01_02_425006 205 17 not not RB 10_1101-2021_01_02_425006 205 18 certified certify VBN 10_1101-2021_01_02_425006 205 19 by by IN 10_1101-2021_01_02_425006 205 20 peer peer NN 10_1101-2021_01_02_425006 205 21 review review NN 10_1101-2021_01_02_425006 205 22 ) ) -RRB- 10_1101-2021_01_02_425006 205 23 is be VBZ 10_1101-2021_01_02_425006 205 24 the the DT 10_1101-2021_01_02_425006 205 25 author author NN 10_1101-2021_01_02_425006 205 26 / / SYM 10_1101-2021_01_02_425006 205 27 funder funder NN 10_1101-2021_01_02_425006 205 28 , , , 10_1101-2021_01_02_425006 205 29 who who WP 10_1101-2021_01_02_425006 205 30 has have VBZ 10_1101-2021_01_02_425006 205 31 granted grant VBN 10_1101-2021_01_02_425006 205 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 205 33 a a DT 10_1101-2021_01_02_425006 205 34 license license NN 10_1101-2021_01_02_425006 205 35 to to TO 10_1101-2021_01_02_425006 205 36 display display VB 10_1101-2021_01_02_425006 205 37 the the DT 10_1101-2021_01_02_425006 205 38 preprint preprint NN 10_1101-2021_01_02_425006 205 39 in in IN 10_1101-2021_01_02_425006 205 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 205 41 . . . 10_1101-2021_01_02_425006 206 1 It -PRON- PRP 10_1101-2021_01_02_425006 206 2 is be VBZ 10_1101-2021_01_02_425006 206 3 made make VBN 10_1101-2021_01_02_425006 206 4 The the DT 10_1101-2021_01_02_425006 206 5 copyright copyright NN 10_1101-2021_01_02_425006 206 6 holder holder NN 10_1101-2021_01_02_425006 206 7 for for IN 10_1101-2021_01_02_425006 206 8 this this DT 10_1101-2021_01_02_425006 206 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 206 10 version version NN 10_1101-2021_01_02_425006 206 11 posted post VBD 10_1101-2021_01_02_425006 206 12 January January NNP 10_1101-2021_01_02_425006 206 13 6 6 CD 10_1101-2021_01_02_425006 206 14 , , , 10_1101-2021_01_02_425006 206 15 2021 2021 CD 10_1101-2021_01_02_425006 206 16 . . . 10_1101-2021_01_02_425006 206 17 ; ; : 10_1101-2021_01_02_425006 206 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 206 19 : : : 10_1101-2021_01_02_425006 206 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 206 21 preprint preprint NN 10_1101-2021_01_02_425006 206 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 206 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 206 24 19 19 CD 10_1101-2021_01_02_425006 206 25 Predicted predict VBN 10_1101-2021_01_02_425006 206 26 ATUs atu NNS 10_1101-2021_01_02_425006 206 27 by by IN 10_1101-2021_01_02_425006 206 28 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 206 29 are be VBP 10_1101-2021_01_02_425006 206 30 verified verify VBN 10_1101-2021_01_02_425006 206 31 by by IN 10_1101-2021_01_02_425006 206 32 experimental experimental JJ 10_1101-2021_01_02_425006 206 33 TSSs tss NNS 10_1101-2021_01_02_425006 206 34 and and CC 10_1101-2021_01_02_425006 206 35 TTSs tts NNS 10_1101-2021_01_02_425006 206 36 314 314 CD 10_1101-2021_01_02_425006 206 37 An an DT 10_1101-2021_01_02_425006 206 38 experimental experimental JJ 10_1101-2021_01_02_425006 206 39 TSS TSS NNP 10_1101-2021_01_02_425006 206 40 dataset dataset NN 10_1101-2021_01_02_425006 206 41 of of IN 10_1101-2021_01_02_425006 206 42 E. E. NNP 10_1101-2021_01_02_425006 206 43 coli coli NNS 10_1101-2021_01_02_425006 206 44 from from IN 10_1101-2021_01_02_425006 206 45 SEnd SEnd NNP 10_1101-2021_01_02_425006 206 46 - - HYPH 10_1101-2021_01_02_425006 206 47 seq seq NNP 10_1101-2021_01_02_425006 206 48 ( ( -LRB- 10_1101-2021_01_02_425006 206 49 7 7 CD 10_1101-2021_01_02_425006 206 50 ) ) -RRB- 10_1101-2021_01_02_425006 206 51 and and CC 10_1101-2021_01_02_425006 206 52 a a DT 10_1101-2021_01_02_425006 206 53 TF TF NNP 10_1101-2021_01_02_425006 206 54 binding bind VBG 10_1101-2021_01_02_425006 206 55 site site NN 10_1101-2021_01_02_425006 206 56 dataset dataset NN 10_1101-2021_01_02_425006 206 57 of of IN 10_1101-2021_01_02_425006 206 58 E. E. NNP 10_1101-2021_01_02_425006 206 59 coli coli NNS 10_1101-2021_01_02_425006 206 60 from from IN 10_1101-2021_01_02_425006 206 61 315 315 CD 10_1101-2021_01_02_425006 206 62 the the DT 10_1101-2021_01_02_425006 206 63 experimental experimental JJ 10_1101-2021_01_02_425006 206 64 dataset dataset NN 10_1101-2021_01_02_425006 206 65 of of IN 10_1101-2021_01_02_425006 206 66 RegulonDB RegulonDB NNP 10_1101-2021_01_02_425006 206 67 ( ( -LRB- 10_1101-2021_01_02_425006 206 68 19 19 CD 10_1101-2021_01_02_425006 206 69 ) ) -RRB- 10_1101-2021_01_02_425006 206 70 were be VBD 10_1101-2021_01_02_425006 206 71 used use VBN 10_1101-2021_01_02_425006 206 72 to to TO 10_1101-2021_01_02_425006 206 73 further further RB 10_1101-2021_01_02_425006 206 74 verify verify VB 10_1101-2021_01_02_425006 206 75 the the DT 10_1101-2021_01_02_425006 206 76 reliability reliability NN 10_1101-2021_01_02_425006 206 77 of of IN 10_1101-2021_01_02_425006 206 78 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 206 79 and and CC 10_1101-2021_01_02_425006 206 80 316 316 CD 10_1101-2021_01_02_425006 206 81 were be VBD 10_1101-2021_01_02_425006 206 82 named name VBN 10_1101-2021_01_02_425006 206 83 dataset dataset NNP 10_1101-2021_01_02_425006 206 84 1 1 CD 10_1101-2021_01_02_425006 206 85 and and CC 10_1101-2021_01_02_425006 206 86 dataset dataset VBD 10_1101-2021_01_02_425006 206 87 2 2 CD 10_1101-2021_01_02_425006 206 88 , , , 10_1101-2021_01_02_425006 206 89 respectively respectively RB 10_1101-2021_01_02_425006 206 90 . . . 10_1101-2021_01_02_425006 207 1 There there EX 10_1101-2021_01_02_425006 207 2 were be VBD 10_1101-2021_01_02_425006 207 3 5,512 5,512 CD 10_1101-2021_01_02_425006 207 4 experimental experimental JJ 10_1101-2021_01_02_425006 207 5 TSSs tss NNS 10_1101-2021_01_02_425006 207 6 in in IN 10_1101-2021_01_02_425006 207 7 dataset dataset NNP 10_1101-2021_01_02_425006 207 8 1 1 CD 10_1101-2021_01_02_425006 207 9 and and CC 10_1101-2021_01_02_425006 207 10 317 317 CD 10_1101-2021_01_02_425006 207 11 3,220 3,220 CD 10_1101-2021_01_02_425006 207 12 experimental experimental JJ 10_1101-2021_01_02_425006 207 13 TF TF NNP 10_1101-2021_01_02_425006 207 14 binding bind VBG 10_1101-2021_01_02_425006 207 15 sites site NNS 10_1101-2021_01_02_425006 207 16 in in IN 10_1101-2021_01_02_425006 207 17 dataset dataset NNP 10_1101-2021_01_02_425006 207 18 2 2 CD 10_1101-2021_01_02_425006 207 19 . . . 10_1101-2021_01_02_425006 208 1 We -PRON- PRP 10_1101-2021_01_02_425006 208 2 considered consider VBD 10_1101-2021_01_02_425006 208 3 the the DT 10_1101-2021_01_02_425006 208 4 5’-end 5’-end CD 10_1101-2021_01_02_425006 208 5 genes gene NNS 10_1101-2021_01_02_425006 208 6 and and CC 10_1101-2021_01_02_425006 208 7 no no DT 10_1101-2021_01_02_425006 208 8 5’-end 5’-end CD 10_1101-2021_01_02_425006 208 9 genes gene NNS 10_1101-2021_01_02_425006 208 10 of of IN 10_1101-2021_01_02_425006 208 11 318 318 CD 10_1101-2021_01_02_425006 208 12 the the DT 10_1101-2021_01_02_425006 208 13 predicted predict VBN 10_1101-2021_01_02_425006 208 14 ATUs ATUs NNPS 10_1101-2021_01_02_425006 208 15 by by IN 10_1101-2021_01_02_425006 208 16 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 208 17 . . . 10_1101-2021_01_02_425006 209 1 A a DT 10_1101-2021_01_02_425006 209 2 gene gene NN 10_1101-2021_01_02_425006 209 3 that that WDT 10_1101-2021_01_02_425006 209 4 is be VBZ 10_1101-2021_01_02_425006 209 5 not not RB 10_1101-2021_01_02_425006 209 6 the the DT 10_1101-2021_01_02_425006 209 7 5’-end 5’-end CD 10_1101-2021_01_02_425006 209 8 gene gene NN 10_1101-2021_01_02_425006 209 9 of of IN 10_1101-2021_01_02_425006 209 10 any any DT 10_1101-2021_01_02_425006 209 11 predicted predict VBN 10_1101-2021_01_02_425006 209 12 ATU ATU NNP 10_1101-2021_01_02_425006 209 13 is be VBZ 10_1101-2021_01_02_425006 209 14 named name VBN 10_1101-2021_01_02_425006 209 15 a a DT 10_1101-2021_01_02_425006 209 16 no no DT 10_1101-2021_01_02_425006 209 17 319 319 CD 10_1101-2021_01_02_425006 209 18 5’-end 5’-end CD 10_1101-2021_01_02_425006 209 19 gene gene NN 10_1101-2021_01_02_425006 209 20 . . . 10_1101-2021_01_02_425006 210 1 We -PRON- PRP 10_1101-2021_01_02_425006 210 2 identified identify VBD 10_1101-2021_01_02_425006 210 3 2,177/2,005 2,177/2,005 CD 10_1101-2021_01_02_425006 210 4 5’-end 5’-end CD 10_1101-2021_01_02_425006 210 5 genes gene NNS 10_1101-2021_01_02_425006 210 6 and and CC 10_1101-2021_01_02_425006 210 7 1,266/1,160 1,266/1,160 CD 10_1101-2021_01_02_425006 210 8 no no DT 10_1101-2021_01_02_425006 210 9 5’-end 5’-end CD 10_1101-2021_01_02_425006 210 10 genes gene NNS 10_1101-2021_01_02_425006 210 11 of of IN 10_1101-2021_01_02_425006 210 12 the the DT 10_1101-2021_01_02_425006 210 13 predicted predict VBN 10_1101-2021_01_02_425006 210 14 320 320 CD 10_1101-2021_01_02_425006 210 15 ATUs ATUs NNPS 10_1101-2021_01_02_425006 210 16 for for IN 10_1101-2021_01_02_425006 210 17 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 210 18 / / SYM 10_1101-2021_01_02_425006 210 19 RiEnich RiEnich NNP 10_1101-2021_01_02_425006 210 20 . . . 10_1101-2021_01_02_425006 211 1 A a DT 10_1101-2021_01_02_425006 211 2 gene gene NN 10_1101-2021_01_02_425006 211 3 validated validate VBN 10_1101-2021_01_02_425006 211 4 by by IN 10_1101-2021_01_02_425006 211 5 experimental experimental JJ 10_1101-2021_01_02_425006 211 6 TSSs tss NNS 10_1101-2021_01_02_425006 211 7 or or CC 10_1101-2021_01_02_425006 211 8 TF TF NNP 10_1101-2021_01_02_425006 211 9 binding bind VBG 10_1101-2021_01_02_425006 211 10 sites site NNS 10_1101-2021_01_02_425006 211 11 means mean VBZ 10_1101-2021_01_02_425006 211 12 321 321 CD 10_1101-2021_01_02_425006 211 13 that that IN 10_1101-2021_01_02_425006 211 14 it -PRON- PRP 10_1101-2021_01_02_425006 211 15 is be VBZ 10_1101-2021_01_02_425006 211 16 the the DT 10_1101-2021_01_02_425006 211 17 immediate immediate JJ 10_1101-2021_01_02_425006 211 18 downstream downstream JJ 10_1101-2021_01_02_425006 211 19 gene gene NN 10_1101-2021_01_02_425006 211 20 of of IN 10_1101-2021_01_02_425006 211 21 an an DT 10_1101-2021_01_02_425006 211 22 experimental experimental JJ 10_1101-2021_01_02_425006 211 23 TSS TSS NNP 10_1101-2021_01_02_425006 211 24 or or CC 10_1101-2021_01_02_425006 211 25 TF TF NNP 10_1101-2021_01_02_425006 211 26 binding binding NN 10_1101-2021_01_02_425006 211 27 site site NN 10_1101-2021_01_02_425006 211 28 . . . 10_1101-2021_01_02_425006 212 1 As as IN 10_1101-2021_01_02_425006 212 2 a a DT 10_1101-2021_01_02_425006 212 3 result result NN 10_1101-2021_01_02_425006 212 4 , , , 10_1101-2021_01_02_425006 212 5 the the DT 10_1101-2021_01_02_425006 212 6 322 322 CD 10_1101-2021_01_02_425006 212 7 proportion proportion NN 10_1101-2021_01_02_425006 212 8 of of IN 10_1101-2021_01_02_425006 212 9 5’-end 5’-end CD 10_1101-2021_01_02_425006 212 10 genes gene NNS 10_1101-2021_01_02_425006 212 11 of of IN 10_1101-2021_01_02_425006 212 12 the the DT 10_1101-2021_01_02_425006 212 13 predicted predict VBN 10_1101-2021_01_02_425006 212 14 ATUs atu NNS 10_1101-2021_01_02_425006 212 15 that that WDT 10_1101-2021_01_02_425006 212 16 were be VBD 10_1101-2021_01_02_425006 212 17 validated validate VBN 10_1101-2021_01_02_425006 212 18 by by IN 10_1101-2021_01_02_425006 212 19 experimental experimental JJ 10_1101-2021_01_02_425006 212 20 TSSs tss NNS 10_1101-2021_01_02_425006 212 21 or or CC 10_1101-2021_01_02_425006 212 22 TF TF NNP 10_1101-2021_01_02_425006 212 23 323 323 CD 10_1101-2021_01_02_425006 212 24 binding bind VBG 10_1101-2021_01_02_425006 212 25 sites site NNS 10_1101-2021_01_02_425006 212 26 was be VBD 10_1101-2021_01_02_425006 212 27 over over IN 10_1101-2021_01_02_425006 212 28 1.7 1.7 CD 10_1101-2021_01_02_425006 212 29 times time NNS 10_1101-2021_01_02_425006 212 30 greater great JJR 10_1101-2021_01_02_425006 212 31 than than IN 10_1101-2021_01_02_425006 212 32 that that DT 10_1101-2021_01_02_425006 212 33 of of IN 10_1101-2021_01_02_425006 212 34 the the DT 10_1101-2021_01_02_425006 212 35 no no NN 10_1101-2021_01_02_425006 212 36 5’-end 5’-end CD 10_1101-2021_01_02_425006 212 37 genes gene NNS 10_1101-2021_01_02_425006 212 38 ( ( -LRB- 10_1101-2021_01_02_425006 212 39 Table table NN 10_1101-2021_01_02_425006 212 40 1 1 CD 10_1101-2021_01_02_425006 212 41 ) ) -RRB- 10_1101-2021_01_02_425006 212 42 . . . 10_1101-2021_01_02_425006 213 1 Specifically specifically RB 10_1101-2021_01_02_425006 213 2 , , , 10_1101-2021_01_02_425006 213 3 the the DT 10_1101-2021_01_02_425006 213 4 324 324 CD 10_1101-2021_01_02_425006 213 5 proportion proportion NN 10_1101-2021_01_02_425006 213 6 of of IN 10_1101-2021_01_02_425006 213 7 5’-end 5’-end CD 10_1101-2021_01_02_425006 213 8 genes gene NNS 10_1101-2021_01_02_425006 213 9 ( ( -LRB- 10_1101-2021_01_02_425006 213 10 29%/30 29%/30 CD 10_1101-2021_01_02_425006 213 11 % % NN 10_1101-2021_01_02_425006 213 12 for for IN 10_1101-2021_01_02_425006 213 13 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 213 14 / / SYM 10_1101-2021_01_02_425006 213 15 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 213 16 ) ) -RRB- 10_1101-2021_01_02_425006 213 17 validated validate VBN 10_1101-2021_01_02_425006 213 18 by by IN 10_1101-2021_01_02_425006 213 19 experimental experimental JJ 10_1101-2021_01_02_425006 213 20 TF TF NNP 10_1101-2021_01_02_425006 213 21 325 325 CD 10_1101-2021_01_02_425006 213 22 binding bind VBG 10_1101-2021_01_02_425006 213 23 sites site NNS 10_1101-2021_01_02_425006 213 24 was be VBD 10_1101-2021_01_02_425006 213 25 over over IN 10_1101-2021_01_02_425006 213 26 three three CD 10_1101-2021_01_02_425006 213 27 times time NNS 10_1101-2021_01_02_425006 213 28 greater great JJR 10_1101-2021_01_02_425006 213 29 than than IN 10_1101-2021_01_02_425006 213 30 the the DT 10_1101-2021_01_02_425006 213 31 no no NN 10_1101-2021_01_02_425006 213 32 5’-end 5’-end CD 10_1101-2021_01_02_425006 213 33 genes gene NNS 10_1101-2021_01_02_425006 213 34 ( ( -LRB- 10_1101-2021_01_02_425006 213 35 9.2%/9.0 9.2%/9.0 CD 10_1101-2021_01_02_425006 213 36 % % NN 10_1101-2021_01_02_425006 213 37 for for IN 10_1101-2021_01_02_425006 213 38 326 326 CD 10_1101-2021_01_02_425006 213 39 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 213 40 / / SYM 10_1101-2021_01_02_425006 213 41 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 213 42 ) ) -RRB- 10_1101-2021_01_02_425006 213 43 . . . 10_1101-2021_01_02_425006 214 1 These these DT 10_1101-2021_01_02_425006 214 2 results result NNS 10_1101-2021_01_02_425006 214 3 further further RB 10_1101-2021_01_02_425006 214 4 verified verify VBD 10_1101-2021_01_02_425006 214 5 the the DT 10_1101-2021_01_02_425006 214 6 reliability reliability NN 10_1101-2021_01_02_425006 214 7 of of IN 10_1101-2021_01_02_425006 214 8 the the DT 10_1101-2021_01_02_425006 214 9 ATUs ATUs NNPS 10_1101-2021_01_02_425006 214 10 predicted predict VBN 10_1101-2021_01_02_425006 214 11 by by IN 10_1101-2021_01_02_425006 214 12 327 327 CD 10_1101-2021_01_02_425006 214 13 SeqATU seqatu NN 10_1101-2021_01_02_425006 214 14 in in IN 10_1101-2021_01_02_425006 214 15 terms term NNS 10_1101-2021_01_02_425006 214 16 of of IN 10_1101-2021_01_02_425006 214 17 the the DT 10_1101-2021_01_02_425006 214 18 TSS TSS NNP 10_1101-2021_01_02_425006 214 19 level level NN 10_1101-2021_01_02_425006 214 20 . . . 10_1101-2021_01_02_425006 215 1 In in IN 10_1101-2021_01_02_425006 215 2 addition addition NN 10_1101-2021_01_02_425006 215 3 , , , 10_1101-2021_01_02_425006 215 4 four four CD 10_1101-2021_01_02_425006 215 5 other other JJ 10_1101-2021_01_02_425006 215 6 experimental experimental JJ 10_1101-2021_01_02_425006 215 7 TSS tss NN 10_1101-2021_01_02_425006 215 8 or or CC 10_1101-2021_01_02_425006 215 9 promoter promoter NN 10_1101-2021_01_02_425006 215 10 datasets dataset NNS 10_1101-2021_01_02_425006 215 11 from from IN 10_1101-2021_01_02_425006 215 12 328 328 CD 10_1101-2021_01_02_425006 215 13 RegulonDB regulondb NN 10_1101-2021_01_02_425006 215 14 ( ( -LRB- 10_1101-2021_01_02_425006 215 15 19 19 CD 10_1101-2021_01_02_425006 215 16 ) ) -RRB- 10_1101-2021_01_02_425006 215 17 , , , 10_1101-2021_01_02_425006 215 18 dRNA drna NN 10_1101-2021_01_02_425006 215 19 - - HYPH 10_1101-2021_01_02_425006 215 20 seq seq NN 10_1101-2021_01_02_425006 215 21 ( ( -LRB- 10_1101-2021_01_02_425006 215 22 14 14 CD 10_1101-2021_01_02_425006 215 23 ) ) -RRB- 10_1101-2021_01_02_425006 215 24 , , , 10_1101-2021_01_02_425006 215 25 and and CC 10_1101-2021_01_02_425006 215 26 Cappable cappable JJ 10_1101-2021_01_02_425006 215 27 - - HYPH 10_1101-2021_01_02_425006 215 28 seq seq NN 10_1101-2021_01_02_425006 215 29 ( ( -LRB- 10_1101-2021_01_02_425006 215 30 13 13 CD 10_1101-2021_01_02_425006 215 31 ) ) -RRB- 10_1101-2021_01_02_425006 215 32 were be VBD 10_1101-2021_01_02_425006 215 33 also also RB 10_1101-2021_01_02_425006 215 34 examined examine VBN 10_1101-2021_01_02_425006 215 35 . . . 10_1101-2021_01_02_425006 216 1 The the DT 10_1101-2021_01_02_425006 216 2 results result NNS 10_1101-2021_01_02_425006 216 3 are be VBP 10_1101-2021_01_02_425006 216 4 shown show VBN 10_1101-2021_01_02_425006 216 5 in in IN 10_1101-2021_01_02_425006 216 6 329 329 CD 10_1101-2021_01_02_425006 216 7 table table NN 10_1101-2021_01_02_425006 216 8 S3 S3 NNP 10_1101-2021_01_02_425006 216 9 , , , 10_1101-2021_01_02_425006 216 10 and and CC 10_1101-2021_01_02_425006 216 11 we -PRON- PRP 10_1101-2021_01_02_425006 216 12 also also RB 10_1101-2021_01_02_425006 216 13 found find VBD 10_1101-2021_01_02_425006 216 14 a a DT 10_1101-2021_01_02_425006 216 15 higher high JJR 10_1101-2021_01_02_425006 216 16 proportion proportion NN 10_1101-2021_01_02_425006 216 17 of of IN 10_1101-2021_01_02_425006 216 18 5’-end 5’-end CD 10_1101-2021_01_02_425006 216 19 genes gene NNS 10_1101-2021_01_02_425006 216 20 of of IN 10_1101-2021_01_02_425006 216 21 the the DT 10_1101-2021_01_02_425006 216 22 predicted predict VBN 10_1101-2021_01_02_425006 216 23 ATUs ATUs NNPS 10_1101-2021_01_02_425006 216 24 validated validate VBN 10_1101-2021_01_02_425006 216 25 by by IN 10_1101-2021_01_02_425006 216 26 330 330 CD 10_1101-2021_01_02_425006 216 27 experimental experimental JJ 10_1101-2021_01_02_425006 216 28 TSSs tss NNS 10_1101-2021_01_02_425006 216 29 or or CC 10_1101-2021_01_02_425006 216 30 promoters promoter NNS 10_1101-2021_01_02_425006 216 31 than than IN 10_1101-2021_01_02_425006 216 32 that that DT 10_1101-2021_01_02_425006 216 33 of of IN 10_1101-2021_01_02_425006 216 34 no no DT 10_1101-2021_01_02_425006 216 35 5’-end 5’-end CD 10_1101-2021_01_02_425006 216 36 genes gene NNS 10_1101-2021_01_02_425006 216 37 . . . 10_1101-2021_01_02_425006 217 1 331 331 CD 10_1101-2021_01_02_425006 217 2 We -PRON- PRP 10_1101-2021_01_02_425006 217 3 also also RB 10_1101-2021_01_02_425006 217 4 used use VBD 10_1101-2021_01_02_425006 217 5 two two CD 10_1101-2021_01_02_425006 217 6 experimental experimental JJ 10_1101-2021_01_02_425006 217 7 TTS TTS NNP 10_1101-2021_01_02_425006 217 8 datasets dataset NNS 10_1101-2021_01_02_425006 217 9 of of IN 10_1101-2021_01_02_425006 217 10 E. E. NNP 10_1101-2021_01_02_425006 217 11 coli coli NNS 10_1101-2021_01_02_425006 217 12 from from IN 10_1101-2021_01_02_425006 217 13 SEnd SEnd NNP 10_1101-2021_01_02_425006 217 14 - - HYPH 10_1101-2021_01_02_425006 217 15 seq seq NNP 10_1101-2021_01_02_425006 217 16 ( ( -LRB- 10_1101-2021_01_02_425006 217 17 7 7 CD 10_1101-2021_01_02_425006 217 18 ) ) -RRB- 10_1101-2021_01_02_425006 217 19 and and CC 10_1101-2021_01_02_425006 217 20 RegulonDB regulondb NN 10_1101-2021_01_02_425006 217 21 ( ( -LRB- 10_1101-2021_01_02_425006 217 22 19 19 CD 10_1101-2021_01_02_425006 217 23 ) ) -RRB- 10_1101-2021_01_02_425006 217 24 to to IN 10_1101-2021_01_02_425006 217 25 332 332 CD 10_1101-2021_01_02_425006 217 26 .CC .CC : 10_1101-2021_01_02_425006 217 27 - - HYPH 10_1101-2021_01_02_425006 217 28 BY by IN 10_1101-2021_01_02_425006 217 29 - - HYPH 10_1101-2021_01_02_425006 217 30 NC NC NNP 10_1101-2021_01_02_425006 217 31 - - HYPH 10_1101-2021_01_02_425006 217 32 ND ND NNP 10_1101-2021_01_02_425006 217 33 4.0 4.0 CD 10_1101-2021_01_02_425006 217 34 International International NNP 10_1101-2021_01_02_425006 217 35 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 217 36 under under IN 10_1101-2021_01_02_425006 217 37 a a DT 10_1101-2021_01_02_425006 217 38 ( ( -LRB- 10_1101-2021_01_02_425006 217 39 which which WDT 10_1101-2021_01_02_425006 217 40 was be VBD 10_1101-2021_01_02_425006 217 41 not not RB 10_1101-2021_01_02_425006 217 42 certified certify VBN 10_1101-2021_01_02_425006 217 43 by by IN 10_1101-2021_01_02_425006 217 44 peer peer NN 10_1101-2021_01_02_425006 217 45 review review NN 10_1101-2021_01_02_425006 217 46 ) ) -RRB- 10_1101-2021_01_02_425006 217 47 is be VBZ 10_1101-2021_01_02_425006 217 48 the the DT 10_1101-2021_01_02_425006 217 49 author author NN 10_1101-2021_01_02_425006 217 50 / / SYM 10_1101-2021_01_02_425006 217 51 funder funder NN 10_1101-2021_01_02_425006 217 52 , , , 10_1101-2021_01_02_425006 217 53 who who WP 10_1101-2021_01_02_425006 217 54 has have VBZ 10_1101-2021_01_02_425006 217 55 granted grant VBN 10_1101-2021_01_02_425006 217 56 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 217 57 a a DT 10_1101-2021_01_02_425006 217 58 license license NN 10_1101-2021_01_02_425006 217 59 to to TO 10_1101-2021_01_02_425006 217 60 display display VB 10_1101-2021_01_02_425006 217 61 the the DT 10_1101-2021_01_02_425006 217 62 preprint preprint NN 10_1101-2021_01_02_425006 217 63 in in IN 10_1101-2021_01_02_425006 217 64 perpetuity perpetuity NN 10_1101-2021_01_02_425006 217 65 . . . 10_1101-2021_01_02_425006 218 1 It -PRON- PRP 10_1101-2021_01_02_425006 218 2 is be VBZ 10_1101-2021_01_02_425006 218 3 made make VBN 10_1101-2021_01_02_425006 218 4 The the DT 10_1101-2021_01_02_425006 218 5 copyright copyright NN 10_1101-2021_01_02_425006 218 6 holder holder NN 10_1101-2021_01_02_425006 218 7 for for IN 10_1101-2021_01_02_425006 218 8 this this DT 10_1101-2021_01_02_425006 218 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 218 10 version version NN 10_1101-2021_01_02_425006 218 11 posted post VBD 10_1101-2021_01_02_425006 218 12 January January NNP 10_1101-2021_01_02_425006 218 13 6 6 CD 10_1101-2021_01_02_425006 218 14 , , , 10_1101-2021_01_02_425006 218 15 2021 2021 CD 10_1101-2021_01_02_425006 218 16 . . . 10_1101-2021_01_02_425006 218 17 ; ; : 10_1101-2021_01_02_425006 218 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 218 19 : : : 10_1101-2021_01_02_425006 218 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 218 21 preprint preprint NN 10_1101-2021_01_02_425006 218 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 218 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 218 24 20 20 CD 10_1101-2021_01_02_425006 218 25 verify verify VBP 10_1101-2021_01_02_425006 218 26 the the DT 10_1101-2021_01_02_425006 218 27 reliability reliability NN 10_1101-2021_01_02_425006 218 28 of of IN 10_1101-2021_01_02_425006 218 29 predicted predict VBN 10_1101-2021_01_02_425006 218 30 ATUs atu NNS 10_1101-2021_01_02_425006 218 31 by by IN 10_1101-2021_01_02_425006 218 32 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 218 33 in in IN 10_1101-2021_01_02_425006 218 34 terms term NNS 10_1101-2021_01_02_425006 218 35 of of IN 10_1101-2021_01_02_425006 218 36 TTS TTS NNP 10_1101-2021_01_02_425006 218 37 level level NN 10_1101-2021_01_02_425006 218 38 . . . 10_1101-2021_01_02_425006 219 1 These these DT 10_1101-2021_01_02_425006 219 2 two two CD 10_1101-2021_01_02_425006 219 3 experimental experimental JJ 10_1101-2021_01_02_425006 219 4 TTS TTS NNP 10_1101-2021_01_02_425006 219 5 333 333 CD 10_1101-2021_01_02_425006 219 6 datasets dataset NNS 10_1101-2021_01_02_425006 219 7 were be VBD 10_1101-2021_01_02_425006 219 8 named name VBN 10_1101-2021_01_02_425006 219 9 dataset dataset NNP 10_1101-2021_01_02_425006 219 10 3 3 CD 10_1101-2021_01_02_425006 219 11 and and CC 10_1101-2021_01_02_425006 219 12 dataset dataset VBD 10_1101-2021_01_02_425006 219 13 4 4 CD 10_1101-2021_01_02_425006 219 14 , , , 10_1101-2021_01_02_425006 219 15 respectively respectively RB 10_1101-2021_01_02_425006 219 16 . . . 10_1101-2021_01_02_425006 220 1 There there EX 10_1101-2021_01_02_425006 220 2 were be VBD 10_1101-2021_01_02_425006 220 3 1,540 1,540 CD 10_1101-2021_01_02_425006 220 4 experimental experimental JJ 10_1101-2021_01_02_425006 220 5 TTSs TTSs NNPS 10_1101-2021_01_02_425006 220 6 in in IN 10_1101-2021_01_02_425006 220 7 334 334 CD 10_1101-2021_01_02_425006 220 8 dataset dataset NN 10_1101-2021_01_02_425006 220 9 3 3 CD 10_1101-2021_01_02_425006 220 10 and and CC 10_1101-2021_01_02_425006 220 11 367 367 CD 10_1101-2021_01_02_425006 220 12 experimental experimental JJ 10_1101-2021_01_02_425006 220 13 TTSs TTSs NNPS 10_1101-2021_01_02_425006 220 14 in in IN 10_1101-2021_01_02_425006 220 15 dataset dataset NN 10_1101-2021_01_02_425006 220 16 4 4 CD 10_1101-2021_01_02_425006 220 17 . . . 10_1101-2021_01_02_425006 221 1 We -PRON- PRP 10_1101-2021_01_02_425006 221 2 considered consider VBD 10_1101-2021_01_02_425006 221 3 the the DT 10_1101-2021_01_02_425006 221 4 3’-end 3’-end CD 10_1101-2021_01_02_425006 221 5 genes gene NNS 10_1101-2021_01_02_425006 221 6 and and CC 10_1101-2021_01_02_425006 221 7 no no DT 10_1101-2021_01_02_425006 221 8 3’-end 3’-end CD 10_1101-2021_01_02_425006 221 9 genes gene NNS 10_1101-2021_01_02_425006 221 10 335 335 CD 10_1101-2021_01_02_425006 221 11 of of IN 10_1101-2021_01_02_425006 221 12 the the DT 10_1101-2021_01_02_425006 221 13 predicted predict VBN 10_1101-2021_01_02_425006 221 14 ATUs ATUs NNPS 10_1101-2021_01_02_425006 221 15 by by IN 10_1101-2021_01_02_425006 221 16 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 221 17 . . . 10_1101-2021_01_02_425006 222 1 A a DT 10_1101-2021_01_02_425006 222 2 gene gene NN 10_1101-2021_01_02_425006 222 3 that that WDT 10_1101-2021_01_02_425006 222 4 is be VBZ 10_1101-2021_01_02_425006 222 5 not not RB 10_1101-2021_01_02_425006 222 6 the the DT 10_1101-2021_01_02_425006 222 7 3’-end 3’-end CD 10_1101-2021_01_02_425006 222 8 gene gene NN 10_1101-2021_01_02_425006 222 9 of of IN 10_1101-2021_01_02_425006 222 10 any any DT 10_1101-2021_01_02_425006 222 11 predicted predict VBN 10_1101-2021_01_02_425006 222 12 ATU ATU NNP 10_1101-2021_01_02_425006 222 13 is be VBZ 10_1101-2021_01_02_425006 222 14 named name VBN 10_1101-2021_01_02_425006 222 15 a a DT 10_1101-2021_01_02_425006 222 16 336 336 CD 10_1101-2021_01_02_425006 222 17 no no DT 10_1101-2021_01_02_425006 222 18 3’-end 3’-end CD 10_1101-2021_01_02_425006 222 19 gene gene NN 10_1101-2021_01_02_425006 222 20 . . . 10_1101-2021_01_02_425006 223 1 We -PRON- PRP 10_1101-2021_01_02_425006 223 2 identified identify VBD 10_1101-2021_01_02_425006 223 3 2,290/2,187 2,290/2,187 , 10_1101-2021_01_02_425006 223 4 3’-end 3’-end CD 10_1101-2021_01_02_425006 223 5 genes gene NNS 10_1101-2021_01_02_425006 223 6 and and CC 10_1101-2021_01_02_425006 223 7 1,153/978 1,153/978 CD 10_1101-2021_01_02_425006 223 8 no no DT 10_1101-2021_01_02_425006 223 9 3’-end 3’-end CD 10_1101-2021_01_02_425006 223 10 genes gene NNS 10_1101-2021_01_02_425006 223 11 of of IN 10_1101-2021_01_02_425006 223 12 the the DT 10_1101-2021_01_02_425006 223 13 predicted predict VBN 10_1101-2021_01_02_425006 223 14 337 337 CD 10_1101-2021_01_02_425006 223 15 ATUs atu NNS 10_1101-2021_01_02_425006 223 16 for for IN 10_1101-2021_01_02_425006 223 17 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 223 18 / / SYM 10_1101-2021_01_02_425006 223 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 223 20 . . . 10_1101-2021_01_02_425006 224 1 A a DT 10_1101-2021_01_02_425006 224 2 gene gene NN 10_1101-2021_01_02_425006 224 3 validated validate VBN 10_1101-2021_01_02_425006 224 4 by by IN 10_1101-2021_01_02_425006 224 5 experimental experimental JJ 10_1101-2021_01_02_425006 224 6 TTSs TTSs NNPS 10_1101-2021_01_02_425006 224 7 means mean VBZ 10_1101-2021_01_02_425006 224 8 that that IN 10_1101-2021_01_02_425006 224 9 it -PRON- PRP 10_1101-2021_01_02_425006 224 10 is be VBZ 10_1101-2021_01_02_425006 224 11 the the DT 10_1101-2021_01_02_425006 224 12 338 338 CD 10_1101-2021_01_02_425006 224 13 immediate immediate JJ 10_1101-2021_01_02_425006 224 14 upstream upstream JJ 10_1101-2021_01_02_425006 224 15 gene gene NN 10_1101-2021_01_02_425006 224 16 of of IN 10_1101-2021_01_02_425006 224 17 an an DT 10_1101-2021_01_02_425006 224 18 experimental experimental JJ 10_1101-2021_01_02_425006 224 19 TTS TTS NNP 10_1101-2021_01_02_425006 224 20 . . . 10_1101-2021_01_02_425006 225 1 As as IN 10_1101-2021_01_02_425006 225 2 a a DT 10_1101-2021_01_02_425006 225 3 result result NN 10_1101-2021_01_02_425006 225 4 , , , 10_1101-2021_01_02_425006 225 5 the the DT 10_1101-2021_01_02_425006 225 6 proportion proportion NN 10_1101-2021_01_02_425006 225 7 of of IN 10_1101-2021_01_02_425006 225 8 3’-end 3’-end CD 10_1101-2021_01_02_425006 225 9 genes gene NNS 10_1101-2021_01_02_425006 225 10 of of IN 10_1101-2021_01_02_425006 225 11 the the DT 10_1101-2021_01_02_425006 225 12 339 339 CD 10_1101-2021_01_02_425006 225 13 predicted predict VBD 10_1101-2021_01_02_425006 225 14 ATUs atu NNS 10_1101-2021_01_02_425006 225 15 that that WDT 10_1101-2021_01_02_425006 225 16 were be VBD 10_1101-2021_01_02_425006 225 17 validated validate VBN 10_1101-2021_01_02_425006 225 18 by by IN 10_1101-2021_01_02_425006 225 19 experimental experimental JJ 10_1101-2021_01_02_425006 225 20 TTSs TTSs NNPS 10_1101-2021_01_02_425006 225 21 was be VBD 10_1101-2021_01_02_425006 225 22 over over IN 10_1101-2021_01_02_425006 225 23 two two CD 10_1101-2021_01_02_425006 225 24 times time NNS 10_1101-2021_01_02_425006 225 25 greater great JJR 10_1101-2021_01_02_425006 225 26 than than IN 10_1101-2021_01_02_425006 225 27 that that DT 10_1101-2021_01_02_425006 225 28 of of IN 10_1101-2021_01_02_425006 225 29 no no DT 10_1101-2021_01_02_425006 225 30 3’-340 3’-340 CD 10_1101-2021_01_02_425006 225 31 end end NN 10_1101-2021_01_02_425006 225 32 genes gene NNS 10_1101-2021_01_02_425006 225 33 ( ( -LRB- 10_1101-2021_01_02_425006 225 34 Table table NN 10_1101-2021_01_02_425006 225 35 2 2 CD 10_1101-2021_01_02_425006 225 36 ) ) -RRB- 10_1101-2021_01_02_425006 225 37 . . . 10_1101-2021_01_02_425006 226 1 Specifically specifically RB 10_1101-2021_01_02_425006 226 2 , , , 10_1101-2021_01_02_425006 226 3 the the DT 10_1101-2021_01_02_425006 226 4 proportion proportion NN 10_1101-2021_01_02_425006 226 5 of of IN 10_1101-2021_01_02_425006 226 6 3’-end 3’-end CD 10_1101-2021_01_02_425006 226 7 genes gene NNS 10_1101-2021_01_02_425006 226 8 ( ( -LRB- 10_1101-2021_01_02_425006 226 9 51%/53 51%/53 CD 10_1101-2021_01_02_425006 226 10 % % NN 10_1101-2021_01_02_425006 226 11 for for IN 10_1101-2021_01_02_425006 226 12 341 341 CD 10_1101-2021_01_02_425006 226 13 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 226 14 / / SYM 10_1101-2021_01_02_425006 226 15 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 226 16 ) ) -RRB- 10_1101-2021_01_02_425006 226 17 validated validate VBN 10_1101-2021_01_02_425006 226 18 by by IN 10_1101-2021_01_02_425006 226 19 experimental experimental JJ 10_1101-2021_01_02_425006 226 20 TTSs TTSs NNPS 10_1101-2021_01_02_425006 226 21 from from IN 10_1101-2021_01_02_425006 226 22 SEnd SEnd NNP 10_1101-2021_01_02_425006 226 23 - - HYPH 10_1101-2021_01_02_425006 226 24 seq seq NNP 10_1101-2021_01_02_425006 226 25 was be VBD 10_1101-2021_01_02_425006 226 26 over over IN 10_1101-2021_01_02_425006 226 27 three three CD 10_1101-2021_01_02_425006 226 28 times time NNS 10_1101-2021_01_02_425006 226 29 342 342 CD 10_1101-2021_01_02_425006 226 30 greater great JJR 10_1101-2021_01_02_425006 226 31 than than IN 10_1101-2021_01_02_425006 226 32 that that DT 10_1101-2021_01_02_425006 226 33 of of IN 10_1101-2021_01_02_425006 226 34 no no DT 10_1101-2021_01_02_425006 226 35 3’-end 3’-end CD 10_1101-2021_01_02_425006 226 36 genes gene NNS 10_1101-2021_01_02_425006 226 37 ( ( -LRB- 10_1101-2021_01_02_425006 226 38 15%/14 15%/14 CD 10_1101-2021_01_02_425006 226 39 % % NN 10_1101-2021_01_02_425006 226 40 for for IN 10_1101-2021_01_02_425006 226 41 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 226 42 / / SYM 10_1101-2021_01_02_425006 226 43 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 226 44 ) ) -RRB- 10_1101-2021_01_02_425006 226 45 . . . 10_1101-2021_01_02_425006 227 1 These these DT 10_1101-2021_01_02_425006 227 2 results result NNS 10_1101-2021_01_02_425006 227 3 further far RBR 10_1101-2021_01_02_425006 227 4 343 343 CD 10_1101-2021_01_02_425006 227 5 verified verify VBD 10_1101-2021_01_02_425006 227 6 the the DT 10_1101-2021_01_02_425006 227 7 reliability reliability NN 10_1101-2021_01_02_425006 227 8 of of IN 10_1101-2021_01_02_425006 227 9 the the DT 10_1101-2021_01_02_425006 227 10 ATUs atu NNS 10_1101-2021_01_02_425006 227 11 predicted predict VBN 10_1101-2021_01_02_425006 227 12 by by IN 10_1101-2021_01_02_425006 227 13 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 227 14 in in IN 10_1101-2021_01_02_425006 227 15 terms term NNS 10_1101-2021_01_02_425006 227 16 of of IN 10_1101-2021_01_02_425006 227 17 the the DT 10_1101-2021_01_02_425006 227 18 TTS TTS NNP 10_1101-2021_01_02_425006 227 19 level level NN 10_1101-2021_01_02_425006 227 20 . . . 10_1101-2021_01_02_425006 228 1 In in IN 10_1101-2021_01_02_425006 228 2 addition addition NN 10_1101-2021_01_02_425006 228 3 , , , 10_1101-2021_01_02_425006 228 4 two two CD 10_1101-2021_01_02_425006 228 5 344 344 CD 10_1101-2021_01_02_425006 228 6 other other JJ 10_1101-2021_01_02_425006 228 7 computationally computationally RB 10_1101-2021_01_02_425006 228 8 predicted predict VBD 10_1101-2021_01_02_425006 228 9 TTS TTS NNP 10_1101-2021_01_02_425006 228 10 datasets dataset NNS 10_1101-2021_01_02_425006 228 11 from from IN 10_1101-2021_01_02_425006 228 12 the the DT 10_1101-2021_01_02_425006 228 13 works work NNS 10_1101-2021_01_02_425006 228 14 by by IN 10_1101-2021_01_02_425006 228 15 Nadiras Nadiras NNP 10_1101-2021_01_02_425006 228 16 et et FW 10_1101-2021_01_02_425006 228 17 al al NNP 10_1101-2021_01_02_425006 228 18 . . . 10_1101-2021_01_02_425006 229 1 ( ( -LRB- 10_1101-2021_01_02_425006 229 2 40 40 CD 10_1101-2021_01_02_425006 229 3 ) ) -RRB- 10_1101-2021_01_02_425006 229 4 and and CC 10_1101-2021_01_02_425006 229 5 Kingsford Kingsford NNP 10_1101-2021_01_02_425006 229 6 et et NNP 10_1101-2021_01_02_425006 229 7 al al NNP 10_1101-2021_01_02_425006 229 8 . . . 10_1101-2021_01_02_425006 230 1 345 345 CD 10_1101-2021_01_02_425006 230 2 ( ( -LRB- 10_1101-2021_01_02_425006 230 3 41 41 CD 10_1101-2021_01_02_425006 230 4 ) ) -RRB- 10_1101-2021_01_02_425006 230 5 were be VBD 10_1101-2021_01_02_425006 230 6 also also RB 10_1101-2021_01_02_425006 230 7 examined examine VBN 10_1101-2021_01_02_425006 230 8 . . . 10_1101-2021_01_02_425006 231 1 The the DT 10_1101-2021_01_02_425006 231 2 results result NNS 10_1101-2021_01_02_425006 231 3 are be VBP 10_1101-2021_01_02_425006 231 4 shown show VBN 10_1101-2021_01_02_425006 231 5 in in IN 10_1101-2021_01_02_425006 231 6 table table NN 10_1101-2021_01_02_425006 231 7 S4 s4 NN 10_1101-2021_01_02_425006 231 8 , , , 10_1101-2021_01_02_425006 231 9 and and CC 10_1101-2021_01_02_425006 231 10 we -PRON- PRP 10_1101-2021_01_02_425006 231 11 also also RB 10_1101-2021_01_02_425006 231 12 found find VBD 10_1101-2021_01_02_425006 231 13 the the DT 10_1101-2021_01_02_425006 231 14 proportion proportion NN 10_1101-2021_01_02_425006 231 15 of of IN 10_1101-2021_01_02_425006 231 16 3’-end 3’-end CD 10_1101-2021_01_02_425006 231 17 346 346 CD 10_1101-2021_01_02_425006 231 18 genes gene NNS 10_1101-2021_01_02_425006 231 19 ( ( -LRB- 10_1101-2021_01_02_425006 231 20 63%/62 63%/62 NNP 10_1101-2021_01_02_425006 231 21 % % NN 10_1101-2021_01_02_425006 231 22 for for IN 10_1101-2021_01_02_425006 231 23 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 231 24 / / SYM 10_1101-2021_01_02_425006 231 25 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 231 26 ) ) -RRB- 10_1101-2021_01_02_425006 231 27 validated validate VBN 10_1101-2021_01_02_425006 231 28 by by IN 10_1101-2021_01_02_425006 231 29 computationally computationally RB 10_1101-2021_01_02_425006 231 30 predicted predict VBD 10_1101-2021_01_02_425006 231 31 Rho-347 rho-347 CD 10_1101-2021_01_02_425006 231 32 independent independent JJ 10_1101-2021_01_02_425006 231 33 TTSs TTSs NNPS 10_1101-2021_01_02_425006 231 34 was be VBD 10_1101-2021_01_02_425006 231 35 over over IN 10_1101-2021_01_02_425006 231 36 two two CD 10_1101-2021_01_02_425006 231 37 times time NNS 10_1101-2021_01_02_425006 231 38 greater great JJR 10_1101-2021_01_02_425006 231 39 than than IN 10_1101-2021_01_02_425006 231 40 that that DT 10_1101-2021_01_02_425006 231 41 of of IN 10_1101-2021_01_02_425006 231 42 no no DT 10_1101-2021_01_02_425006 231 43 3’-end 3’-end CD 10_1101-2021_01_02_425006 231 44 genes gene NNS 10_1101-2021_01_02_425006 231 45 ( ( -LRB- 10_1101-2021_01_02_425006 231 46 29%/29 29%/29 CD 10_1101-2021_01_02_425006 231 47 % % NN 10_1101-2021_01_02_425006 231 48 for for IN 10_1101-2021_01_02_425006 231 49 348 348 CD 10_1101-2021_01_02_425006 231 50 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 231 51 / / SYM 10_1101-2021_01_02_425006 231 52 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 231 53 ) ) -RRB- 10_1101-2021_01_02_425006 231 54 . . . 10_1101-2021_01_02_425006 232 1 349 349 CD 10_1101-2021_01_02_425006 232 2 Please please UH 10_1101-2021_01_02_425006 232 3 place place VB 10_1101-2021_01_02_425006 232 4 Table table NN 10_1101-2021_01_02_425006 232 5 1 1 CD 10_1101-2021_01_02_425006 232 6 here here RB 10_1101-2021_01_02_425006 232 7 . . . 10_1101-2021_01_02_425006 233 1 350 350 CD 10_1101-2021_01_02_425006 233 2 Please please UH 10_1101-2021_01_02_425006 233 3 place place VB 10_1101-2021_01_02_425006 233 4 Table table NN 10_1101-2021_01_02_425006 233 5 2 2 CD 10_1101-2021_01_02_425006 233 6 here here RB 10_1101-2021_01_02_425006 233 7 . . . 10_1101-2021_01_02_425006 234 1 351 351 CD 10_1101-2021_01_02_425006 234 2 .CC .CC : 10_1101-2021_01_02_425006 234 3 - - HYPH 10_1101-2021_01_02_425006 234 4 BY by IN 10_1101-2021_01_02_425006 234 5 - - HYPH 10_1101-2021_01_02_425006 234 6 NC NC NNP 10_1101-2021_01_02_425006 234 7 - - HYPH 10_1101-2021_01_02_425006 234 8 ND ND NNP 10_1101-2021_01_02_425006 234 9 4.0 4.0 CD 10_1101-2021_01_02_425006 234 10 International International NNP 10_1101-2021_01_02_425006 234 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 234 12 under under IN 10_1101-2021_01_02_425006 234 13 a a DT 10_1101-2021_01_02_425006 234 14 ( ( -LRB- 10_1101-2021_01_02_425006 234 15 which which WDT 10_1101-2021_01_02_425006 234 16 was be VBD 10_1101-2021_01_02_425006 234 17 not not RB 10_1101-2021_01_02_425006 234 18 certified certify VBN 10_1101-2021_01_02_425006 234 19 by by IN 10_1101-2021_01_02_425006 234 20 peer peer NN 10_1101-2021_01_02_425006 234 21 review review NN 10_1101-2021_01_02_425006 234 22 ) ) -RRB- 10_1101-2021_01_02_425006 234 23 is be VBZ 10_1101-2021_01_02_425006 234 24 the the DT 10_1101-2021_01_02_425006 234 25 author author NN 10_1101-2021_01_02_425006 234 26 / / SYM 10_1101-2021_01_02_425006 234 27 funder funder NN 10_1101-2021_01_02_425006 234 28 , , , 10_1101-2021_01_02_425006 234 29 who who WP 10_1101-2021_01_02_425006 234 30 has have VBZ 10_1101-2021_01_02_425006 234 31 granted grant VBN 10_1101-2021_01_02_425006 234 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 234 33 a a DT 10_1101-2021_01_02_425006 234 34 license license NN 10_1101-2021_01_02_425006 234 35 to to TO 10_1101-2021_01_02_425006 234 36 display display VB 10_1101-2021_01_02_425006 234 37 the the DT 10_1101-2021_01_02_425006 234 38 preprint preprint NN 10_1101-2021_01_02_425006 234 39 in in IN 10_1101-2021_01_02_425006 234 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 234 41 . . . 10_1101-2021_01_02_425006 235 1 It -PRON- PRP 10_1101-2021_01_02_425006 235 2 is be VBZ 10_1101-2021_01_02_425006 235 3 made make VBN 10_1101-2021_01_02_425006 235 4 The the DT 10_1101-2021_01_02_425006 235 5 copyright copyright NN 10_1101-2021_01_02_425006 235 6 holder holder NN 10_1101-2021_01_02_425006 235 7 for for IN 10_1101-2021_01_02_425006 235 8 this this DT 10_1101-2021_01_02_425006 235 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 235 10 version version NN 10_1101-2021_01_02_425006 235 11 posted post VBD 10_1101-2021_01_02_425006 235 12 January January NNP 10_1101-2021_01_02_425006 235 13 6 6 CD 10_1101-2021_01_02_425006 235 14 , , , 10_1101-2021_01_02_425006 235 15 2021 2021 CD 10_1101-2021_01_02_425006 235 16 . . . 10_1101-2021_01_02_425006 235 17 ; ; : 10_1101-2021_01_02_425006 235 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 235 19 : : : 10_1101-2021_01_02_425006 235 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 235 21 preprint preprint NN 10_1101-2021_01_02_425006 235 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 235 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 235 24 21 21 CD 10_1101-2021_01_02_425006 235 25 The the DT 10_1101-2021_01_02_425006 235 26 gene gene NN 10_1101-2021_01_02_425006 235 27 pairs pair NNS 10_1101-2021_01_02_425006 235 28 frequently frequently RB 10_1101-2021_01_02_425006 235 29 encoded encode VBN 10_1101-2021_01_02_425006 235 30 in in IN 10_1101-2021_01_02_425006 235 31 the the DT 10_1101-2021_01_02_425006 235 32 same same JJ 10_1101-2021_01_02_425006 235 33 ATUs atu NNS 10_1101-2021_01_02_425006 235 34 are be VBP 10_1101-2021_01_02_425006 235 35 more more RBR 10_1101-2021_01_02_425006 235 36 functionally functionally RB 10_1101-2021_01_02_425006 235 37 related related JJ 10_1101-2021_01_02_425006 235 38 than than IN 10_1101-2021_01_02_425006 235 39 those those DT 10_1101-2021_01_02_425006 235 40 that that IN 10_1101-2021_01_02_425006 235 41 352 352 CD 10_1101-2021_01_02_425006 235 42 can can MD 10_1101-2021_01_02_425006 235 43 belong belong VB 10_1101-2021_01_02_425006 235 44 to to IN 10_1101-2021_01_02_425006 235 45 two two CD 10_1101-2021_01_02_425006 235 46 distinct distinct JJ 10_1101-2021_01_02_425006 235 47 ATUs atu NNS 10_1101-2021_01_02_425006 235 48 353 353 CD 10_1101-2021_01_02_425006 235 49 Functional functional JJ 10_1101-2021_01_02_425006 235 50 analysis analysis NN 10_1101-2021_01_02_425006 235 51 was be VBD 10_1101-2021_01_02_425006 235 52 conducted conduct VBN 10_1101-2021_01_02_425006 235 53 by by IN 10_1101-2021_01_02_425006 235 54 integrating integrate VBG 10_1101-2021_01_02_425006 235 55 GO GO NNP 10_1101-2021_01_02_425006 235 56 terms term NNS 10_1101-2021_01_02_425006 235 57 from from IN 10_1101-2021_01_02_425006 235 58 the the DT 10_1101-2021_01_02_425006 235 59 Gene Gene NNP 10_1101-2021_01_02_425006 235 60 Ontology Ontology NNP 10_1101-2021_01_02_425006 235 61 ( ( -LRB- 10_1101-2021_01_02_425006 235 62 GO GO NNP 10_1101-2021_01_02_425006 235 63 ) ) -RRB- 10_1101-2021_01_02_425006 235 64 database database NN 10_1101-2021_01_02_425006 235 65 354 354 CD 10_1101-2021_01_02_425006 235 66 ( ( -LRB- 10_1101-2021_01_02_425006 235 67 42 42 CD 10_1101-2021_01_02_425006 235 68 ) ) -RRB- 10_1101-2021_01_02_425006 235 69 . . . 10_1101-2021_01_02_425006 236 1 In in IN 10_1101-2021_01_02_425006 236 2 detail detail NN 10_1101-2021_01_02_425006 236 3 , , , 10_1101-2021_01_02_425006 236 4 we -PRON- PRP 10_1101-2021_01_02_425006 236 5 measured measure VBD 10_1101-2021_01_02_425006 236 6 the the DT 10_1101-2021_01_02_425006 236 7 level level NN 10_1101-2021_01_02_425006 236 8 of of IN 10_1101-2021_01_02_425006 236 9 functional functional JJ 10_1101-2021_01_02_425006 236 10 relatedness relatedness NN 10_1101-2021_01_02_425006 236 11 for for IN 10_1101-2021_01_02_425006 236 12 two two CD 10_1101-2021_01_02_425006 236 13 types type NNS 10_1101-2021_01_02_425006 236 14 of of IN 10_1101-2021_01_02_425006 236 15 consecutive consecutive JJ 10_1101-2021_01_02_425006 236 16 gene gene NN 10_1101-2021_01_02_425006 236 17 pairs pair NNS 10_1101-2021_01_02_425006 236 18 , , , 10_1101-2021_01_02_425006 236 19 355 355 CD 10_1101-2021_01_02_425006 236 20 which which WDT 10_1101-2021_01_02_425006 236 21 is be VBZ 10_1101-2021_01_02_425006 236 22 similar similar JJ 10_1101-2021_01_02_425006 236 23 to to IN 10_1101-2021_01_02_425006 236 24 the the DT 10_1101-2021_01_02_425006 236 25 definition definition NN 10_1101-2021_01_02_425006 236 26 in in IN 10_1101-2021_01_02_425006 236 27 the the DT 10_1101-2021_01_02_425006 236 28 work work NN 10_1101-2021_01_02_425006 236 29 by by IN 10_1101-2021_01_02_425006 236 30 Mao Mao NNP 10_1101-2021_01_02_425006 236 31 et et NNP 10_1101-2021_01_02_425006 236 32 al al NNP 10_1101-2021_01_02_425006 236 33 . . . 10_1101-2021_01_02_425006 237 1 ( ( -LRB- 10_1101-2021_01_02_425006 237 2 38 38 CD 10_1101-2021_01_02_425006 237 3 ) ) -RRB- 10_1101-2021_01_02_425006 237 4 . . . 10_1101-2021_01_02_425006 238 1 Two two CD 10_1101-2021_01_02_425006 238 2 types type NNS 10_1101-2021_01_02_425006 238 3 of of IN 10_1101-2021_01_02_425006 238 4 consecutive consecutive JJ 10_1101-2021_01_02_425006 238 5 gene gene NN 10_1101-2021_01_02_425006 238 6 pairs pairs NNP 10_1101-2021_01_02_425006 238 7 356 356 CD 10_1101-2021_01_02_425006 238 8 were be VBD 10_1101-2021_01_02_425006 238 9 ( ( -LRB- 10_1101-2021_01_02_425006 238 10 i i NN 10_1101-2021_01_02_425006 238 11 ) ) -RRB- 10_1101-2021_01_02_425006 238 12 gene gene VBP 10_1101-2021_01_02_425006 238 13 pairs pair NNS 10_1101-2021_01_02_425006 238 14 each each DT 10_1101-2021_01_02_425006 238 15 consisting consisting NN 10_1101-2021_01_02_425006 238 16 of of IN 10_1101-2021_01_02_425006 238 17 a a DT 10_1101-2021_01_02_425006 238 18 5’-end 5’-end CD 10_1101-2021_01_02_425006 238 19 gene gene NN 10_1101-2021_01_02_425006 238 20 of of IN 10_1101-2021_01_02_425006 238 21 an an DT 10_1101-2021_01_02_425006 238 22 ATU ATU NNP 10_1101-2021_01_02_425006 238 23 and and CC 10_1101-2021_01_02_425006 238 24 the the DT 10_1101-2021_01_02_425006 238 25 gene gene NN 10_1101-2021_01_02_425006 238 26 in in IN 10_1101-2021_01_02_425006 238 27 its -PRON- PRP$ 10_1101-2021_01_02_425006 238 28 immediate immediate JJ 10_1101-2021_01_02_425006 238 29 upstream upstream NN 10_1101-2021_01_02_425006 238 30 357 357 CD 10_1101-2021_01_02_425006 238 31 on on IN 10_1101-2021_01_02_425006 238 32 the the DT 10_1101-2021_01_02_425006 238 33 same same JJ 10_1101-2021_01_02_425006 238 34 strand strand NN 10_1101-2021_01_02_425006 238 35 and and CC 10_1101-2021_01_02_425006 238 36 ( ( -LRB- 10_1101-2021_01_02_425006 238 37 ii ii LS 10_1101-2021_01_02_425006 238 38 ) ) -RRB- 10_1101-2021_01_02_425006 238 39 all all PDT 10_1101-2021_01_02_425006 238 40 the the DT 10_1101-2021_01_02_425006 238 41 other other JJ 10_1101-2021_01_02_425006 238 42 gene gene NN 10_1101-2021_01_02_425006 238 43 pairs pair NNS 10_1101-2021_01_02_425006 238 44 inside inside IN 10_1101-2021_01_02_425006 238 45 an an DT 10_1101-2021_01_02_425006 238 46 ATU ATU NNP 10_1101-2021_01_02_425006 238 47 ( ( -LRB- 10_1101-2021_01_02_425006 238 48 Fig fig NN 10_1101-2021_01_02_425006 238 49 . . . 10_1101-2021_01_02_425006 239 1 7A 7a NN 10_1101-2021_01_02_425006 239 2 ) ) -RRB- 10_1101-2021_01_02_425006 239 3 . . . 10_1101-2021_01_02_425006 240 1 In in IN 10_1101-2021_01_02_425006 240 2 addition addition NN 10_1101-2021_01_02_425006 240 3 , , , 10_1101-2021_01_02_425006 240 4 we -PRON- PRP 10_1101-2021_01_02_425006 240 5 used use VBD 10_1101-2021_01_02_425006 240 6 a a DT 10_1101-2021_01_02_425006 240 7 358 358 CD 10_1101-2021_01_02_425006 240 8 scoring score VBG 10_1101-2021_01_02_425006 240 9 scheme scheme NN 10_1101-2021_01_02_425006 240 10 to to TO 10_1101-2021_01_02_425006 240 11 measure measure VB 10_1101-2021_01_02_425006 240 12 the the DT 10_1101-2021_01_02_425006 240 13 GO GO NNP 10_1101-2021_01_02_425006 240 14 - - HYPH 10_1101-2021_01_02_425006 240 15 based base VBN 10_1101-2021_01_02_425006 240 16 functional functional JJ 10_1101-2021_01_02_425006 240 17 similarity similarity NN 10_1101-2021_01_02_425006 240 18 between between IN 10_1101-2021_01_02_425006 240 19 a a DT 10_1101-2021_01_02_425006 240 20 pair pair NN 10_1101-2021_01_02_425006 240 21 of of IN 10_1101-2021_01_02_425006 240 22 genes gene NNS 10_1101-2021_01_02_425006 240 23 by by IN 10_1101-2021_01_02_425006 240 24 Wu Wu NNP 10_1101-2021_01_02_425006 240 25 et et NNP 10_1101-2021_01_02_425006 240 26 al al NNP 10_1101-2021_01_02_425006 240 27 . . . 10_1101-2021_01_02_425006 241 1 ( ( -LRB- 10_1101-2021_01_02_425006 241 2 43 43 CD 10_1101-2021_01_02_425006 241 3 ) ) -RRB- 10_1101-2021_01_02_425006 241 4 . . . 10_1101-2021_01_02_425006 242 1 359 359 CD 10_1101-2021_01_02_425006 242 2 This this DT 10_1101-2021_01_02_425006 242 3 study study NN 10_1101-2021_01_02_425006 242 4 developed develop VBD 10_1101-2021_01_02_425006 242 5 a a DT 10_1101-2021_01_02_425006 242 6 GO GO NNP 10_1101-2021_01_02_425006 242 7 similarity similarity NN 10_1101-2021_01_02_425006 242 8 score score NN 10_1101-2021_01_02_425006 242 9 and and CC 10_1101-2021_01_02_425006 242 10 showed show VBD 10_1101-2021_01_02_425006 242 11 that that IN 10_1101-2021_01_02_425006 242 12 the the DT 10_1101-2021_01_02_425006 242 13 larger large JJR 10_1101-2021_01_02_425006 242 14 the the DT 10_1101-2021_01_02_425006 242 15 score score NN 10_1101-2021_01_02_425006 242 16 , , , 10_1101-2021_01_02_425006 242 17 the the DT 10_1101-2021_01_02_425006 242 18 more more RBR 10_1101-2021_01_02_425006 242 19 likely likely JJ 10_1101-2021_01_02_425006 242 20 that that IN 10_1101-2021_01_02_425006 242 21 360 360 CD 10_1101-2021_01_02_425006 242 22 two two CD 10_1101-2021_01_02_425006 242 23 genes gene NNS 10_1101-2021_01_02_425006 242 24 are be VBP 10_1101-2021_01_02_425006 242 25 functionally functionally RB 10_1101-2021_01_02_425006 242 26 related relate VBN 10_1101-2021_01_02_425006 242 27 . . . 10_1101-2021_01_02_425006 243 1 In in IN 10_1101-2021_01_02_425006 243 2 brief brief NN 10_1101-2021_01_02_425006 243 3 , , , 10_1101-2021_01_02_425006 243 4 the the DT 10_1101-2021_01_02_425006 243 5 GO GO NNP 10_1101-2021_01_02_425006 243 6 similarity similarity NN 10_1101-2021_01_02_425006 243 7 score score NN 10_1101-2021_01_02_425006 243 8 of of IN 10_1101-2021_01_02_425006 243 9 a a DT 10_1101-2021_01_02_425006 243 10 gene gene NN 10_1101-2021_01_02_425006 243 11 pair pair NN 10_1101-2021_01_02_425006 243 12 � � NNP 10_1101-2021_01_02_425006 243 13 � � NNP 10_1101-2021_01_02_425006 243 14 and and CC 10_1101-2021_01_02_425006 243 15 � � NNP 10_1101-2021_01_02_425006 243 16 � � NNP 10_1101-2021_01_02_425006 243 17 is be VBZ 10_1101-2021_01_02_425006 243 18 361 361 CD 10_1101-2021_01_02_425006 243 19 denoted denote VBN 10_1101-2021_01_02_425006 243 20 as as IN 10_1101-2021_01_02_425006 243 21 � � NNP 10_1101-2021_01_02_425006 243 22 � � NNS 10_1101-2021_01_02_425006 243 23 � � , 10_1101-2021_01_02_425006 243 24 ( ( -LRB- 10_1101-2021_01_02_425006 243 25 � � NNP 10_1101-2021_01_02_425006 243 26 � � NNP 10_1101-2021_01_02_425006 243 27 , , , 10_1101-2021_01_02_425006 243 28 � � NNP 10_1101-2021_01_02_425006 243 29 � � NNP 10_1101-2021_01_02_425006 243 30 ): ): NN 10_1101-2021_01_02_425006 243 31 362 362 CD 10_1101-2021_01_02_425006 243 32 � � NNS 10_1101-2021_01_02_425006 243 33 � � JJ 10_1101-2021_01_02_425006 243 34 � � NNS 10_1101-2021_01_02_425006 243 35 � � JJ 10_1101-2021_01_02_425006 243 36 � � NNP 10_1101-2021_01_02_425006 243 37 � � NNP 10_1101-2021_01_02_425006 243 38 , , , 10_1101-2021_01_02_425006 243 39 � � NNP 10_1101-2021_01_02_425006 243 40 � � ADD 10_1101-2021_01_02_425006 243 41 � � NNP 10_1101-2021_01_02_425006 243 42 = = VBZ 10_1101-2021_01_02_425006 243 43 � � ADD 10_1101-2021_01_02_425006 243 44 � � VBZ 10_1101-2021_01_02_425006 243 45 � � NN 10_1101-2021_01_02_425006 243 46 � � NNS 10_1101-2021_01_02_425006 243 47 � � VBZ 10_1101-2021_01_02_425006 243 48 ∈ ∈ JJ 10_1101-2021_01_02_425006 243 49 � � NN 10_1101-2021_01_02_425006 243 50 ( ( -LRB- 10_1101-2021_01_02_425006 243 51 � � NNP 10_1101-2021_01_02_425006 243 52 � � NNP 10_1101-2021_01_02_425006 243 53 ) ) -RRB- 10_1101-2021_01_02_425006 243 54 , , , 10_1101-2021_01_02_425006 243 55 � � NNP 10_1101-2021_01_02_425006 243 56 � � NNP 10_1101-2021_01_02_425006 243 57 ∈ ∈ IN 10_1101-2021_01_02_425006 243 58 � � NNP 10_1101-2021_01_02_425006 243 59 ( ( -LRB- 10_1101-2021_01_02_425006 243 60 � � NNP 10_1101-2021_01_02_425006 243 61 � � NNP 10_1101-2021_01_02_425006 243 62 ) ) -RRB- 10_1101-2021_01_02_425006 243 63 � � NNP 10_1101-2021_01_02_425006 243 64 ( ( -LRB- 10_1101-2021_01_02_425006 243 65 � � NNP 10_1101-2021_01_02_425006 243 66 � � NNP 10_1101-2021_01_02_425006 243 67 , , , 10_1101-2021_01_02_425006 243 68 � � NNP 10_1101-2021_01_02_425006 243 69 � � NNP 10_1101-2021_01_02_425006 243 70 ) ) -RRB- 10_1101-2021_01_02_425006 243 71 363 363 CD 10_1101-2021_01_02_425006 243 72 where where WRB 10_1101-2021_01_02_425006 243 73 � � JJ 10_1101-2021_01_02_425006 243 74 � � NNP 10_1101-2021_01_02_425006 243 75 and and CC 10_1101-2021_01_02_425006 243 76 � � NNP 10_1101-2021_01_02_425006 243 77 � � NNP 10_1101-2021_01_02_425006 243 78 are be VBP 10_1101-2021_01_02_425006 243 79 the the DT 10_1101-2021_01_02_425006 243 80 GO GO NNP 10_1101-2021_01_02_425006 243 81 terms term NNS 10_1101-2021_01_02_425006 243 82 assigned assign VBN 10_1101-2021_01_02_425006 243 83 to to IN 10_1101-2021_01_02_425006 243 84 � � NNP 10_1101-2021_01_02_425006 243 85 � � NNP 10_1101-2021_01_02_425006 243 86 and and CC 10_1101-2021_01_02_425006 243 87 � � NNP 10_1101-2021_01_02_425006 243 88 � � NNP 10_1101-2021_01_02_425006 243 89 , , , 10_1101-2021_01_02_425006 243 90 respectively respectively RB 10_1101-2021_01_02_425006 243 91 ; ; : 10_1101-2021_01_02_425006 243 92 � � NNP 10_1101-2021_01_02_425006 243 93 ( ( -LRB- 10_1101-2021_01_02_425006 243 94 � � NNP 10_1101-2021_01_02_425006 243 95 � � NNP 10_1101-2021_01_02_425006 243 96 , , , 10_1101-2021_01_02_425006 243 97 � � NNP 10_1101-2021_01_02_425006 243 98 � � NNP 10_1101-2021_01_02_425006 243 99 ) ) -RRB- 10_1101-2021_01_02_425006 243 100 is be VBZ 10_1101-2021_01_02_425006 243 101 the the DT 10_1101-2021_01_02_425006 243 102 maximal maximal JJ 10_1101-2021_01_02_425006 243 103 364 364 CD 10_1101-2021_01_02_425006 243 104 number number NN 10_1101-2021_01_02_425006 243 105 of of IN 10_1101-2021_01_02_425006 243 106 common common JJ 10_1101-2021_01_02_425006 243 107 terms term NNS 10_1101-2021_01_02_425006 243 108 between between IN 10_1101-2021_01_02_425006 243 109 paths path NNS 10_1101-2021_01_02_425006 243 110 in in IN 10_1101-2021_01_02_425006 243 111 the the DT 10_1101-2021_01_02_425006 243 112 two two CD 10_1101-2021_01_02_425006 243 113 GO GO NNP 10_1101-2021_01_02_425006 243 114 graphs graphs NN 10_1101-2021_01_02_425006 243 115 induced induce VBN 10_1101-2021_01_02_425006 243 116 by by IN 10_1101-2021_01_02_425006 243 117 the the DT 10_1101-2021_01_02_425006 243 118 GO GO NNP 10_1101-2021_01_02_425006 243 119 terms term NNS 10_1101-2021_01_02_425006 243 120 � � NNS 10_1101-2021_01_02_425006 243 121 � � NNP 10_1101-2021_01_02_425006 243 122 and and CC 10_1101-2021_01_02_425006 243 123 � � NNP 10_1101-2021_01_02_425006 243 124 � � NNP 10_1101-2021_01_02_425006 243 125 . . . 10_1101-2021_01_02_425006 244 1 365 365 CD 10_1101-2021_01_02_425006 244 2 As as IN 10_1101-2021_01_02_425006 244 3 a a DT 10_1101-2021_01_02_425006 244 4 result result NN 10_1101-2021_01_02_425006 244 5 , , , 10_1101-2021_01_02_425006 244 6 the the DT 10_1101-2021_01_02_425006 244 7 mean mean JJ 10_1101-2021_01_02_425006 244 8 GO GO NNP 10_1101-2021_01_02_425006 244 9 similarity similarity NN 10_1101-2021_01_02_425006 244 10 score score NN 10_1101-2021_01_02_425006 244 11 was be VBD 10_1101-2021_01_02_425006 244 12 higher high JJR 10_1101-2021_01_02_425006 244 13 for for IN 10_1101-2021_01_02_425006 244 14 type type NN 10_1101-2021_01_02_425006 244 15 - - HYPH 10_1101-2021_01_02_425006 244 16 ii ii NN 10_1101-2021_01_02_425006 244 17 gene gene NN 10_1101-2021_01_02_425006 244 18 pairs pair NNS 10_1101-2021_01_02_425006 244 19 ( ( -LRB- 10_1101-2021_01_02_425006 244 20 5.97 5.97 CD 10_1101-2021_01_02_425006 244 21 versus versus IN 10_1101-2021_01_02_425006 244 22 4.04 4.04 CD 10_1101-2021_01_02_425006 244 23 for for IN 10_1101-2021_01_02_425006 244 24 366 366 CD 10_1101-2021_01_02_425006 244 25 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 244 26 and and CC 10_1101-2021_01_02_425006 244 27 5.86 5.86 CD 10_1101-2021_01_02_425006 244 28 versus versus IN 10_1101-2021_01_02_425006 244 29 3.91 3.91 CD 10_1101-2021_01_02_425006 244 30 for for IN 10_1101-2021_01_02_425006 244 31 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 244 32 ) ) -RRB- 10_1101-2021_01_02_425006 244 33 than than IN 10_1101-2021_01_02_425006 244 34 for for IN 10_1101-2021_01_02_425006 244 35 type type NN 10_1101-2021_01_02_425006 244 36 - - : 10_1101-2021_01_02_425006 244 37 i i PRP 10_1101-2021_01_02_425006 244 38 gene gene NN 10_1101-2021_01_02_425006 244 39 pairs pair NNS 10_1101-2021_01_02_425006 244 40 . . . 10_1101-2021_01_02_425006 245 1 A a DT 10_1101-2021_01_02_425006 245 2 total total NN 10_1101-2021_01_02_425006 245 3 of of IN 10_1101-2021_01_02_425006 245 4 574/524 574/524 NNP 10_1101-2021_01_02_425006 245 5 367 367 CD 10_1101-2021_01_02_425006 245 6 type type NN 10_1101-2021_01_02_425006 245 7 - - HYPH 10_1101-2021_01_02_425006 245 8 ii ii NN 10_1101-2021_01_02_425006 245 9 gene gene NN 10_1101-2021_01_02_425006 245 10 pairs pair NNS 10_1101-2021_01_02_425006 245 11 had have VBD 10_1101-2021_01_02_425006 245 12 GO GO NNP 10_1101-2021_01_02_425006 245 13 similarity similarity NN 10_1101-2021_01_02_425006 245 14 scores score NNS 10_1101-2021_01_02_425006 245 15 greater great JJR 10_1101-2021_01_02_425006 245 16 than than IN 10_1101-2021_01_02_425006 245 17 four four CD 10_1101-2021_01_02_425006 245 18 ( ( -LRB- 10_1101-2021_01_02_425006 245 19 64%/63 64%/63 CD 10_1101-2021_01_02_425006 245 20 % % NN 10_1101-2021_01_02_425006 245 21 of of IN 10_1101-2021_01_02_425006 245 22 a a DT 10_1101-2021_01_02_425006 245 23 total total NN 10_1101-2021_01_02_425006 245 24 of of IN 10_1101-2021_01_02_425006 245 25 899/834 899/834 NNS 10_1101-2021_01_02_425006 245 26 ) ) -RRB- 10_1101-2021_01_02_425006 245 27 , , , 10_1101-2021_01_02_425006 245 28 while while IN 10_1101-2021_01_02_425006 245 29 368 368 CD 10_1101-2021_01_02_425006 245 30 only only RB 10_1101-2021_01_02_425006 245 31 461/404 461/404 CD 10_1101-2021_01_02_425006 245 32 type type NN 10_1101-2021_01_02_425006 245 33 - - : 10_1101-2021_01_02_425006 245 34 i i PRP 10_1101-2021_01_02_425006 245 35 gene gene NN 10_1101-2021_01_02_425006 245 36 pairs pair NNS 10_1101-2021_01_02_425006 245 37 had have VBD 10_1101-2021_01_02_425006 245 38 GO GO NNP 10_1101-2021_01_02_425006 245 39 similarity similarity NN 10_1101-2021_01_02_425006 245 40 scores score NNS 10_1101-2021_01_02_425006 245 41 greater great JJR 10_1101-2021_01_02_425006 245 42 than than IN 10_1101-2021_01_02_425006 245 43 four four CD 10_1101-2021_01_02_425006 245 44 ( ( -LRB- 10_1101-2021_01_02_425006 245 45 36%/34 36%/34 CD 10_1101-2021_01_02_425006 245 46 % % NN 10_1101-2021_01_02_425006 245 47 of of IN 10_1101-2021_01_02_425006 245 48 a a DT 10_1101-2021_01_02_425006 245 49 total total NN 10_1101-2021_01_02_425006 245 50 of of IN 10_1101-2021_01_02_425006 245 51 369 369 CD 10_1101-2021_01_02_425006 245 52 1,274/1,179 1,274/1,179 NNS 10_1101-2021_01_02_425006 245 53 ) ) -RRB- 10_1101-2021_01_02_425006 245 54 for for IN 10_1101-2021_01_02_425006 245 55 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 245 56 / / SYM 10_1101-2021_01_02_425006 245 57 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 245 58 . . . 10_1101-2021_01_02_425006 246 1 We -PRON- PRP 10_1101-2021_01_02_425006 246 2 also also RB 10_1101-2021_01_02_425006 246 3 applied apply VBD 10_1101-2021_01_02_425006 246 4 a a DT 10_1101-2021_01_02_425006 246 5 c c NN 10_1101-2021_01_02_425006 246 6 � � NN 10_1101-2021_01_02_425006 246 7 -test -test , 10_1101-2021_01_02_425006 246 8 ( ( -LRB- 10_1101-2021_01_02_425006 246 9 44 44 CD 10_1101-2021_01_02_425006 246 10 ) ) -RRB- 10_1101-2021_01_02_425006 246 11 to to TO 10_1101-2021_01_02_425006 246 12 determine determine VB 10_1101-2021_01_02_425006 246 13 whether whether IN 10_1101-2021_01_02_425006 246 14 the the DT 10_1101-2021_01_02_425006 246 15 370 370 CD 10_1101-2021_01_02_425006 246 16 .CC .CC , 10_1101-2021_01_02_425006 246 17 - - HYPH 10_1101-2021_01_02_425006 246 18 BY by IN 10_1101-2021_01_02_425006 246 19 - - HYPH 10_1101-2021_01_02_425006 246 20 NC NC NNP 10_1101-2021_01_02_425006 246 21 - - HYPH 10_1101-2021_01_02_425006 246 22 ND ND NNP 10_1101-2021_01_02_425006 246 23 4.0 4.0 CD 10_1101-2021_01_02_425006 246 24 International International NNP 10_1101-2021_01_02_425006 246 25 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 246 26 under under IN 10_1101-2021_01_02_425006 246 27 a a DT 10_1101-2021_01_02_425006 246 28 ( ( -LRB- 10_1101-2021_01_02_425006 246 29 which which WDT 10_1101-2021_01_02_425006 246 30 was be VBD 10_1101-2021_01_02_425006 246 31 not not RB 10_1101-2021_01_02_425006 246 32 certified certify VBN 10_1101-2021_01_02_425006 246 33 by by IN 10_1101-2021_01_02_425006 246 34 peer peer NN 10_1101-2021_01_02_425006 246 35 review review NN 10_1101-2021_01_02_425006 246 36 ) ) -RRB- 10_1101-2021_01_02_425006 246 37 is be VBZ 10_1101-2021_01_02_425006 246 38 the the DT 10_1101-2021_01_02_425006 246 39 author author NN 10_1101-2021_01_02_425006 246 40 / / SYM 10_1101-2021_01_02_425006 246 41 funder funder NN 10_1101-2021_01_02_425006 246 42 , , , 10_1101-2021_01_02_425006 246 43 who who WP 10_1101-2021_01_02_425006 246 44 has have VBZ 10_1101-2021_01_02_425006 246 45 granted grant VBN 10_1101-2021_01_02_425006 246 46 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 246 47 a a DT 10_1101-2021_01_02_425006 246 48 license license NN 10_1101-2021_01_02_425006 246 49 to to TO 10_1101-2021_01_02_425006 246 50 display display VB 10_1101-2021_01_02_425006 246 51 the the DT 10_1101-2021_01_02_425006 246 52 preprint preprint NN 10_1101-2021_01_02_425006 246 53 in in IN 10_1101-2021_01_02_425006 246 54 perpetuity perpetuity NN 10_1101-2021_01_02_425006 246 55 . . . 10_1101-2021_01_02_425006 247 1 It -PRON- PRP 10_1101-2021_01_02_425006 247 2 is be VBZ 10_1101-2021_01_02_425006 247 3 made make VBN 10_1101-2021_01_02_425006 247 4 The the DT 10_1101-2021_01_02_425006 247 5 copyright copyright NN 10_1101-2021_01_02_425006 247 6 holder holder NN 10_1101-2021_01_02_425006 247 7 for for IN 10_1101-2021_01_02_425006 247 8 this this DT 10_1101-2021_01_02_425006 247 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 247 10 version version NN 10_1101-2021_01_02_425006 247 11 posted post VBD 10_1101-2021_01_02_425006 247 12 January January NNP 10_1101-2021_01_02_425006 247 13 6 6 CD 10_1101-2021_01_02_425006 247 14 , , , 10_1101-2021_01_02_425006 247 15 2021 2021 CD 10_1101-2021_01_02_425006 247 16 . . . 10_1101-2021_01_02_425006 247 17 ; ; : 10_1101-2021_01_02_425006 247 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 247 19 : : : 10_1101-2021_01_02_425006 247 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 247 21 preprint preprint NN 10_1101-2021_01_02_425006 247 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 247 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 247 24 22 22 CD 10_1101-2021_01_02_425006 247 25 distribution distribution NN 10_1101-2021_01_02_425006 247 26 of of IN 10_1101-2021_01_02_425006 247 27 � � NNP 10_1101-2021_01_02_425006 247 28 � � JJ 10_1101-2021_01_02_425006 247 29 � � NNS 10_1101-2021_01_02_425006 247 30 � � JJ 10_1101-2021_01_02_425006 247 31 � � NNP 10_1101-2021_01_02_425006 247 32 � � NNP 10_1101-2021_01_02_425006 247 33 , , , 10_1101-2021_01_02_425006 247 34 � � NNP 10_1101-2021_01_02_425006 247 35 � � NNP 10_1101-2021_01_02_425006 247 36 � � NNP 10_1101-2021_01_02_425006 247 37 was be VBD 10_1101-2021_01_02_425006 247 38 different different JJ 10_1101-2021_01_02_425006 247 39 for for IN 10_1101-2021_01_02_425006 247 40 the the DT 10_1101-2021_01_02_425006 247 41 type type NN 10_1101-2021_01_02_425006 247 42 - - : 10_1101-2021_01_02_425006 247 43 i i PRP 10_1101-2021_01_02_425006 247 44 gene gene VBP 10_1101-2021_01_02_425006 247 45 pairs pair NNS 10_1101-2021_01_02_425006 247 46 and and CC 10_1101-2021_01_02_425006 247 47 type type NNP 10_1101-2021_01_02_425006 247 48 - - HYPH 10_1101-2021_01_02_425006 247 49 ii ii NN 10_1101-2021_01_02_425006 247 50 gene gene NN 10_1101-2021_01_02_425006 247 51 pairs pair NNS 10_1101-2021_01_02_425006 247 52 . . . 10_1101-2021_01_02_425006 248 1 The the DT 10_1101-2021_01_02_425006 248 2 c c NNP 10_1101-2021_01_02_425006 248 3 � � , 10_1101-2021_01_02_425006 248 4 -371 -371 NN 10_1101-2021_01_02_425006 248 5 statistics statistic NNS 10_1101-2021_01_02_425006 248 6 corresponded correspond VBD 10_1101-2021_01_02_425006 248 7 to to IN 10_1101-2021_01_02_425006 248 8 a a DT 10_1101-2021_01_02_425006 248 9 P p NN 10_1101-2021_01_02_425006 248 10 - - HYPH 10_1101-2021_01_02_425006 248 11 value value NN 10_1101-2021_01_02_425006 248 12 less less JJR 10_1101-2021_01_02_425006 248 13 than than IN 10_1101-2021_01_02_425006 248 14 10 10 CD 10_1101-2021_01_02_425006 248 15 � � NNP 10_1101-2021_01_02_425006 248 16 � � NNP 10_1101-2021_01_02_425006 248 17 , , , 10_1101-2021_01_02_425006 248 18 which which WDT 10_1101-2021_01_02_425006 248 19 revealed reveal VBD 10_1101-2021_01_02_425006 248 20 that that IN 10_1101-2021_01_02_425006 248 21 the the DT 10_1101-2021_01_02_425006 248 22 distribution distribution NN 10_1101-2021_01_02_425006 248 23 of of IN 10_1101-2021_01_02_425006 248 24 � � NNP 10_1101-2021_01_02_425006 248 25 � � JJ 10_1101-2021_01_02_425006 248 26 � � NNS 10_1101-2021_01_02_425006 248 27 � � JJ 10_1101-2021_01_02_425006 248 28 � � NNP 10_1101-2021_01_02_425006 248 29 � � NNP 10_1101-2021_01_02_425006 248 30 , , , 10_1101-2021_01_02_425006 248 31 � � NNP 10_1101-2021_01_02_425006 248 32 � � NNP 10_1101-2021_01_02_425006 248 33 � � NNP 10_1101-2021_01_02_425006 248 34 372 372 CD 10_1101-2021_01_02_425006 248 35 for for IN 10_1101-2021_01_02_425006 248 36 the the DT 10_1101-2021_01_02_425006 248 37 type type NN 10_1101-2021_01_02_425006 248 38 - - HYPH 10_1101-2021_01_02_425006 248 39 ii ii NN 10_1101-2021_01_02_425006 248 40 gene gene NN 10_1101-2021_01_02_425006 248 41 pairs pairs NNP 10_1101-2021_01_02_425006 248 42 was be VBD 10_1101-2021_01_02_425006 248 43 significantly significantly RB 10_1101-2021_01_02_425006 248 44 different different JJ 10_1101-2021_01_02_425006 248 45 from from IN 10_1101-2021_01_02_425006 248 46 the the DT 10_1101-2021_01_02_425006 248 47 type type NN 10_1101-2021_01_02_425006 248 48 - - : 10_1101-2021_01_02_425006 248 49 i i PRP 10_1101-2021_01_02_425006 248 50 gene gene NN 10_1101-2021_01_02_425006 248 51 pairs pair NNS 10_1101-2021_01_02_425006 248 52 . . . 10_1101-2021_01_02_425006 249 1 Fig fig NN 10_1101-2021_01_02_425006 249 2 . . . 10_1101-2021_01_02_425006 250 1 7B 7B NNP 10_1101-2021_01_02_425006 250 2 shows show VBZ 10_1101-2021_01_02_425006 250 3 the the DT 10_1101-2021_01_02_425006 250 4 373 373 CD 10_1101-2021_01_02_425006 250 5 distribution distribution NN 10_1101-2021_01_02_425006 250 6 of of IN 10_1101-2021_01_02_425006 250 7 � � NNP 10_1101-2021_01_02_425006 250 8 � � JJ 10_1101-2021_01_02_425006 250 9 � � NNS 10_1101-2021_01_02_425006 250 10 � � JJ 10_1101-2021_01_02_425006 250 11 � � NNP 10_1101-2021_01_02_425006 250 12 � � NNP 10_1101-2021_01_02_425006 250 13 , , , 10_1101-2021_01_02_425006 250 14 � � NNP 10_1101-2021_01_02_425006 250 15 � � ADD 10_1101-2021_01_02_425006 250 16 � � NNP 10_1101-2021_01_02_425006 250 17 for for IN 10_1101-2021_01_02_425006 250 18 the the DT 10_1101-2021_01_02_425006 250 19 type type NN 10_1101-2021_01_02_425006 250 20 - - : 10_1101-2021_01_02_425006 250 21 i i PRP 10_1101-2021_01_02_425006 250 22 gene gene VBP 10_1101-2021_01_02_425006 250 23 pairs pair NNS 10_1101-2021_01_02_425006 250 24 and and CC 10_1101-2021_01_02_425006 250 25 the the DT 10_1101-2021_01_02_425006 250 26 type type NN 10_1101-2021_01_02_425006 250 27 - - HYPH 10_1101-2021_01_02_425006 250 28 ii ii NN 10_1101-2021_01_02_425006 250 29 gene gene NN 10_1101-2021_01_02_425006 250 30 pairs pair NNS 10_1101-2021_01_02_425006 250 31 . . . 10_1101-2021_01_02_425006 251 1 These these DT 10_1101-2021_01_02_425006 251 2 results result NNS 10_1101-2021_01_02_425006 251 3 strongly strongly RB 10_1101-2021_01_02_425006 251 4 374 374 CD 10_1101-2021_01_02_425006 251 5 indicated indicate VBD 10_1101-2021_01_02_425006 251 6 that that IN 10_1101-2021_01_02_425006 251 7 the the DT 10_1101-2021_01_02_425006 251 8 type type NN 10_1101-2021_01_02_425006 251 9 - - HYPH 10_1101-2021_01_02_425006 251 10 ii ii NN 10_1101-2021_01_02_425006 251 11 gene gene NN 10_1101-2021_01_02_425006 251 12 pairs pair NNS 10_1101-2021_01_02_425006 251 13 had have VBD 10_1101-2021_01_02_425006 251 14 a a DT 10_1101-2021_01_02_425006 251 15 higher high JJR 10_1101-2021_01_02_425006 251 16 degree degree NN 10_1101-2021_01_02_425006 251 17 of of IN 10_1101-2021_01_02_425006 251 18 GO GO NNP 10_1101-2021_01_02_425006 251 19 similarity similarity NN 10_1101-2021_01_02_425006 251 20 than than IN 10_1101-2021_01_02_425006 251 21 the the DT 10_1101-2021_01_02_425006 251 22 type type NN 10_1101-2021_01_02_425006 251 23 - - : 10_1101-2021_01_02_425006 251 24 i i PRP 10_1101-2021_01_02_425006 251 25 gene gene NN 10_1101-2021_01_02_425006 251 26 pairs pair NNS 10_1101-2021_01_02_425006 251 27 , , , 10_1101-2021_01_02_425006 251 28 375 375 CD 10_1101-2021_01_02_425006 251 29 suggesting suggest VBG 10_1101-2021_01_02_425006 251 30 that that IN 10_1101-2021_01_02_425006 251 31 the the DT 10_1101-2021_01_02_425006 251 32 gene gene NN 10_1101-2021_01_02_425006 251 33 pairs pair NNS 10_1101-2021_01_02_425006 251 34 frequently frequently RB 10_1101-2021_01_02_425006 251 35 encoded encode VBN 10_1101-2021_01_02_425006 251 36 in in IN 10_1101-2021_01_02_425006 251 37 the the DT 10_1101-2021_01_02_425006 251 38 same same JJ 10_1101-2021_01_02_425006 251 39 ATUs atu NNS 10_1101-2021_01_02_425006 251 40 ( ( -LRB- 10_1101-2021_01_02_425006 251 41 type type NN 10_1101-2021_01_02_425006 251 42 - - HYPH 10_1101-2021_01_02_425006 251 43 ii ii NN 10_1101-2021_01_02_425006 251 44 gene gene NN 10_1101-2021_01_02_425006 251 45 pairs pairs NNP 10_1101-2021_01_02_425006 251 46 ) ) -RRB- 10_1101-2021_01_02_425006 251 47 are be VBP 10_1101-2021_01_02_425006 251 48 more more RBR 10_1101-2021_01_02_425006 251 49 376 376 CD 10_1101-2021_01_02_425006 251 50 functionally functionally RB 10_1101-2021_01_02_425006 251 51 related relate VBN 10_1101-2021_01_02_425006 251 52 than than IN 10_1101-2021_01_02_425006 251 53 those those DT 10_1101-2021_01_02_425006 251 54 that that WDT 10_1101-2021_01_02_425006 251 55 can can MD 10_1101-2021_01_02_425006 251 56 belong belong VB 10_1101-2021_01_02_425006 251 57 to to IN 10_1101-2021_01_02_425006 251 58 two two CD 10_1101-2021_01_02_425006 251 59 distinct distinct JJ 10_1101-2021_01_02_425006 251 60 ATUs atu NNS 10_1101-2021_01_02_425006 251 61 ( ( -LRB- 10_1101-2021_01_02_425006 251 62 type type NN 10_1101-2021_01_02_425006 251 63 - - : 10_1101-2021_01_02_425006 251 64 i i PRP 10_1101-2021_01_02_425006 251 65 gene gene VBP 10_1101-2021_01_02_425006 251 66 pairs pair NNS 10_1101-2021_01_02_425006 251 67 ) ) -RRB- 10_1101-2021_01_02_425006 251 68 . . . 10_1101-2021_01_02_425006 252 1 377 377 CD 10_1101-2021_01_02_425006 252 2 We -PRON- PRP 10_1101-2021_01_02_425006 252 3 also also RB 10_1101-2021_01_02_425006 252 4 carried carry VBD 10_1101-2021_01_02_425006 252 5 out out RP 10_1101-2021_01_02_425006 252 6 a a DT 10_1101-2021_01_02_425006 252 7 similar similar JJ 10_1101-2021_01_02_425006 252 8 analysis analysis NN 10_1101-2021_01_02_425006 252 9 of of IN 10_1101-2021_01_02_425006 252 10 the the DT 10_1101-2021_01_02_425006 252 11 two two CD 10_1101-2021_01_02_425006 252 12 different different JJ 10_1101-2021_01_02_425006 252 13 gene gene NN 10_1101-2021_01_02_425006 252 14 pairs pair NNS 10_1101-2021_01_02_425006 252 15 based base VBN 10_1101-2021_01_02_425006 252 16 on on IN 10_1101-2021_01_02_425006 252 17 KEGG KEGG NNP 10_1101-2021_01_02_425006 252 18 enrichment enrichment NN 10_1101-2021_01_02_425006 252 19 378 378 CD 10_1101-2021_01_02_425006 252 20 analysis analysis NN 10_1101-2021_01_02_425006 252 21 ( ( -LRB- 10_1101-2021_01_02_425006 252 22 see see VB 10_1101-2021_01_02_425006 252 23 more more JJR 10_1101-2021_01_02_425006 252 24 details detail NNS 10_1101-2021_01_02_425006 252 25 in in IN 10_1101-2021_01_02_425006 252 26 method method NN 10_1101-2021_01_02_425006 252 27 S9 s9 NN 10_1101-2021_01_02_425006 252 28 ) ) -RRB- 10_1101-2021_01_02_425006 252 29 and and CC 10_1101-2021_01_02_425006 252 30 found find VBD 10_1101-2021_01_02_425006 252 31 that that IN 10_1101-2021_01_02_425006 252 32 the the DT 10_1101-2021_01_02_425006 252 33 proportion proportion NN 10_1101-2021_01_02_425006 252 34 of of IN 10_1101-2021_01_02_425006 252 35 type type NN 10_1101-2021_01_02_425006 252 36 - - HYPH 10_1101-2021_01_02_425006 252 37 ii ii NN 10_1101-2021_01_02_425006 252 38 gene gene NN 10_1101-2021_01_02_425006 252 39 pairs pair NNS 10_1101-2021_01_02_425006 252 40 ( ( -LRB- 10_1101-2021_01_02_425006 252 41 59%/57 59%/57 CD 10_1101-2021_01_02_425006 252 42 % % NN 10_1101-2021_01_02_425006 252 43 379 379 CD 10_1101-2021_01_02_425006 252 44 for for IN 10_1101-2021_01_02_425006 252 45 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 252 46 / / SYM 10_1101-2021_01_02_425006 252 47 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 252 48 ) ) -RRB- 10_1101-2021_01_02_425006 252 49 , , , 10_1101-2021_01_02_425006 252 50 whose whose WP$ 10_1101-2021_01_02_425006 252 51 two two CD 10_1101-2021_01_02_425006 252 52 genes gene NNS 10_1101-2021_01_02_425006 252 53 were be VBD 10_1101-2021_01_02_425006 252 54 contained contain VBN 10_1101-2021_01_02_425006 252 55 in in IN 10_1101-2021_01_02_425006 252 56 the the DT 10_1101-2021_01_02_425006 252 57 same same JJ 10_1101-2021_01_02_425006 252 58 KEGG KEGG NNP 10_1101-2021_01_02_425006 252 59 pathway pathway NN 10_1101-2021_01_02_425006 252 60 , , , 10_1101-2021_01_02_425006 252 61 was be VBD 10_1101-2021_01_02_425006 252 62 380 380 CD 10_1101-2021_01_02_425006 252 63 higher high JJR 10_1101-2021_01_02_425006 252 64 than than IN 10_1101-2021_01_02_425006 252 65 the the DT 10_1101-2021_01_02_425006 252 66 proportion proportion NN 10_1101-2021_01_02_425006 252 67 of of IN 10_1101-2021_01_02_425006 252 68 type type NN 10_1101-2021_01_02_425006 252 69 - - : 10_1101-2021_01_02_425006 252 70 i i PRP 10_1101-2021_01_02_425006 252 71 gene gene NN 10_1101-2021_01_02_425006 252 72 pairs pair NNS 10_1101-2021_01_02_425006 252 73 ( ( -LRB- 10_1101-2021_01_02_425006 252 74 32%/28 32%/28 CD 10_1101-2021_01_02_425006 252 75 % % NN 10_1101-2021_01_02_425006 252 76 for for IN 10_1101-2021_01_02_425006 252 77 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 252 78 / / SYM 10_1101-2021_01_02_425006 252 79 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 252 80 ) ) -RRB- 10_1101-2021_01_02_425006 252 81 ( ( -LRB- 10_1101-2021_01_02_425006 252 82 Fig fig NN 10_1101-2021_01_02_425006 252 83 . . . 10_1101-2021_01_02_425006 253 1 7C 7C NNP 10_1101-2021_01_02_425006 253 2 ) ) -RRB- 10_1101-2021_01_02_425006 253 3 . . . 10_1101-2021_01_02_425006 254 1 381 381 CD 10_1101-2021_01_02_425006 254 2 The the DT 10_1101-2021_01_02_425006 254 3 distribution distribution NN 10_1101-2021_01_02_425006 254 4 of of IN 10_1101-2021_01_02_425006 254 5 the the DT 10_1101-2021_01_02_425006 254 6 KEGG KEGG NNP 10_1101-2021_01_02_425006 254 7 similarity similarity NN 10_1101-2021_01_02_425006 254 8 scores score NNS 10_1101-2021_01_02_425006 254 9 of of IN 10_1101-2021_01_02_425006 254 10 the the DT 10_1101-2021_01_02_425006 254 11 two two CD 10_1101-2021_01_02_425006 254 12 different different JJ 10_1101-2021_01_02_425006 254 13 types type NNS 10_1101-2021_01_02_425006 254 14 of of IN 10_1101-2021_01_02_425006 254 15 gene gene NN 10_1101-2021_01_02_425006 254 16 pairs pair NNS 10_1101-2021_01_02_425006 254 17 is be VBZ 10_1101-2021_01_02_425006 254 18 shown show VBN 10_1101-2021_01_02_425006 254 19 in in IN 10_1101-2021_01_02_425006 254 20 Fig Fig NNP 10_1101-2021_01_02_425006 254 21 . . . 10_1101-2021_01_02_425006 255 1 382 382 CD 10_1101-2021_01_02_425006 255 2 7D 7D NNS 10_1101-2021_01_02_425006 255 3 , , , 10_1101-2021_01_02_425006 255 4 suggesting suggest VBG 10_1101-2021_01_02_425006 255 5 that that IN 10_1101-2021_01_02_425006 255 6 genes gene NNS 10_1101-2021_01_02_425006 255 7 of of IN 10_1101-2021_01_02_425006 255 8 type type NN 10_1101-2021_01_02_425006 255 9 - - HYPH 10_1101-2021_01_02_425006 255 10 ii ii NN 10_1101-2021_01_02_425006 255 11 gene gene NN 10_1101-2021_01_02_425006 255 12 pairs pair NNS 10_1101-2021_01_02_425006 255 13 have have VBP 10_1101-2021_01_02_425006 255 14 a a DT 10_1101-2021_01_02_425006 255 15 higher high JJR 10_1101-2021_01_02_425006 255 16 probability probability NN 10_1101-2021_01_02_425006 255 17 of of IN 10_1101-2021_01_02_425006 255 18 participating participate VBG 10_1101-2021_01_02_425006 255 19 in in IN 10_1101-2021_01_02_425006 255 20 the the DT 10_1101-2021_01_02_425006 255 21 same same JJ 10_1101-2021_01_02_425006 255 22 383 383 CD 10_1101-2021_01_02_425006 255 23 KEGG KEGG NNP 10_1101-2021_01_02_425006 255 24 pathway pathway NN 10_1101-2021_01_02_425006 255 25 than than IN 10_1101-2021_01_02_425006 255 26 those those DT 10_1101-2021_01_02_425006 255 27 of of IN 10_1101-2021_01_02_425006 255 28 type type NN 10_1101-2021_01_02_425006 255 29 - - : 10_1101-2021_01_02_425006 255 30 i i PRP 10_1101-2021_01_02_425006 255 31 gene gene NN 10_1101-2021_01_02_425006 255 32 pairs pair NNS 10_1101-2021_01_02_425006 255 33 . . . 10_1101-2021_01_02_425006 256 1 384 384 CD 10_1101-2021_01_02_425006 256 2 Please please UH 10_1101-2021_01_02_425006 256 3 place place VB 10_1101-2021_01_02_425006 256 4 Fig Fig NNP 10_1101-2021_01_02_425006 256 5 . . . 10_1101-2021_01_02_425006 257 1 7 7 CD 10_1101-2021_01_02_425006 257 2 here here RB 10_1101-2021_01_02_425006 257 3 . . . 10_1101-2021_01_02_425006 258 1 385 385 CD 10_1101-2021_01_02_425006 258 2 DISCUSSION discussion NN 10_1101-2021_01_02_425006 258 3 386 386 CD 10_1101-2021_01_02_425006 258 4 We -PRON- PRP 10_1101-2021_01_02_425006 258 5 developed develop VBD 10_1101-2021_01_02_425006 258 6 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 258 7 , , , 10_1101-2021_01_02_425006 258 8 the the DT 10_1101-2021_01_02_425006 258 9 first first JJ 10_1101-2021_01_02_425006 258 10 computational computational JJ 10_1101-2021_01_02_425006 258 11 method method NN 10_1101-2021_01_02_425006 258 12 for for IN 10_1101-2021_01_02_425006 258 13 genome genome NN 10_1101-2021_01_02_425006 258 14 - - HYPH 10_1101-2021_01_02_425006 258 15 scale scale NN 10_1101-2021_01_02_425006 258 16 ATU ATU NNP 10_1101-2021_01_02_425006 258 17 prediction prediction NN 10_1101-2021_01_02_425006 258 18 by by IN 10_1101-2021_01_02_425006 258 19 analyzing analyze VBG 10_1101-2021_01_02_425006 258 20 387 387 CD 10_1101-2021_01_02_425006 258 21 next- next- JJ 10_1101-2021_01_02_425006 258 22 and and CC 10_1101-2021_01_02_425006 258 23 third third JJ 10_1101-2021_01_02_425006 258 24 - - HYPH 10_1101-2021_01_02_425006 258 25 generation generation NN 10_1101-2021_01_02_425006 258 26 RNA RNA NNP 10_1101-2021_01_02_425006 258 27 - - HYPH 10_1101-2021_01_02_425006 258 28 Seq Seq NNP 10_1101-2021_01_02_425006 258 29 data datum NNS 10_1101-2021_01_02_425006 258 30 , , , 10_1101-2021_01_02_425006 258 31 using use VBG 10_1101-2021_01_02_425006 258 32 a a DT 10_1101-2021_01_02_425006 258 33 CQP CQP NNP 10_1101-2021_01_02_425006 258 34 model model NN 10_1101-2021_01_02_425006 258 35 . . . 10_1101-2021_01_02_425006 259 1 Linear linear JJ 10_1101-2021_01_02_425006 259 2 constraints constraint NNS 10_1101-2021_01_02_425006 259 3 provided provide VBN 10_1101-2021_01_02_425006 259 4 by by IN 10_1101-2021_01_02_425006 259 5 the the DT 10_1101-2021_01_02_425006 259 6 bias bias NN 10_1101-2021_01_02_425006 259 7 388 388 CD 10_1101-2021_01_02_425006 259 8 .CC .CC : 10_1101-2021_01_02_425006 259 9 - - HYPH 10_1101-2021_01_02_425006 259 10 BY by IN 10_1101-2021_01_02_425006 259 11 - - HYPH 10_1101-2021_01_02_425006 259 12 NC NC NNP 10_1101-2021_01_02_425006 259 13 - - HYPH 10_1101-2021_01_02_425006 259 14 ND ND NNP 10_1101-2021_01_02_425006 259 15 4.0 4.0 CD 10_1101-2021_01_02_425006 259 16 International International NNP 10_1101-2021_01_02_425006 259 17 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 259 18 under under IN 10_1101-2021_01_02_425006 259 19 a a DT 10_1101-2021_01_02_425006 259 20 ( ( -LRB- 10_1101-2021_01_02_425006 259 21 which which WDT 10_1101-2021_01_02_425006 259 22 was be VBD 10_1101-2021_01_02_425006 259 23 not not RB 10_1101-2021_01_02_425006 259 24 certified certify VBN 10_1101-2021_01_02_425006 259 25 by by IN 10_1101-2021_01_02_425006 259 26 peer peer NN 10_1101-2021_01_02_425006 259 27 review review NN 10_1101-2021_01_02_425006 259 28 ) ) -RRB- 10_1101-2021_01_02_425006 259 29 is be VBZ 10_1101-2021_01_02_425006 259 30 the the DT 10_1101-2021_01_02_425006 259 31 author author NN 10_1101-2021_01_02_425006 259 32 / / SYM 10_1101-2021_01_02_425006 259 33 funder funder NN 10_1101-2021_01_02_425006 259 34 , , , 10_1101-2021_01_02_425006 259 35 who who WP 10_1101-2021_01_02_425006 259 36 has have VBZ 10_1101-2021_01_02_425006 259 37 granted grant VBN 10_1101-2021_01_02_425006 259 38 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 259 39 a a DT 10_1101-2021_01_02_425006 259 40 license license NN 10_1101-2021_01_02_425006 259 41 to to TO 10_1101-2021_01_02_425006 259 42 display display VB 10_1101-2021_01_02_425006 259 43 the the DT 10_1101-2021_01_02_425006 259 44 preprint preprint NN 10_1101-2021_01_02_425006 259 45 in in IN 10_1101-2021_01_02_425006 259 46 perpetuity perpetuity NN 10_1101-2021_01_02_425006 259 47 . . . 10_1101-2021_01_02_425006 260 1 It -PRON- PRP 10_1101-2021_01_02_425006 260 2 is be VBZ 10_1101-2021_01_02_425006 260 3 made make VBN 10_1101-2021_01_02_425006 260 4 The the DT 10_1101-2021_01_02_425006 260 5 copyright copyright NN 10_1101-2021_01_02_425006 260 6 holder holder NN 10_1101-2021_01_02_425006 260 7 for for IN 10_1101-2021_01_02_425006 260 8 this this DT 10_1101-2021_01_02_425006 260 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 260 10 version version NN 10_1101-2021_01_02_425006 260 11 posted post VBD 10_1101-2021_01_02_425006 260 12 January January NNP 10_1101-2021_01_02_425006 260 13 6 6 CD 10_1101-2021_01_02_425006 260 14 , , , 10_1101-2021_01_02_425006 260 15 2021 2021 CD 10_1101-2021_01_02_425006 260 16 . . . 10_1101-2021_01_02_425006 260 17 ; ; : 10_1101-2021_01_02_425006 260 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 260 19 : : : 10_1101-2021_01_02_425006 260 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 260 21 preprint preprint NN 10_1101-2021_01_02_425006 260 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 260 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 260 24 23 23 CD 10_1101-2021_01_02_425006 260 25 rate rate NN 10_1101-2021_01_02_425006 260 26 of of IN 10_1101-2021_01_02_425006 260 27 read read VBN 10_1101-2021_01_02_425006 260 28 distribution distribution NN 10_1101-2021_01_02_425006 260 29 were be VBD 10_1101-2021_01_02_425006 260 30 , , , 10_1101-2021_01_02_425006 260 31 for for IN 10_1101-2021_01_02_425006 260 32 the the DT 10_1101-2021_01_02_425006 260 33 first first JJ 10_1101-2021_01_02_425006 260 34 time time NN 10_1101-2021_01_02_425006 260 35 , , , 10_1101-2021_01_02_425006 260 36 integrated integrate VBN 10_1101-2021_01_02_425006 260 37 into into IN 10_1101-2021_01_02_425006 260 38 the the DT 10_1101-2021_01_02_425006 260 39 CQP CQP NNP 10_1101-2021_01_02_425006 260 40 model model NN 10_1101-2021_01_02_425006 260 41 . . . 10_1101-2021_01_02_425006 261 1 Positional positional JJ 10_1101-2021_01_02_425006 261 2 bias bias NN 10_1101-2021_01_02_425006 261 3 refers refer VBZ 10_1101-2021_01_02_425006 261 4 to to IN 10_1101-2021_01_02_425006 261 5 389 389 CD 10_1101-2021_01_02_425006 261 6 the the DT 10_1101-2021_01_02_425006 261 7 non non JJ 10_1101-2021_01_02_425006 261 8 - - JJ 10_1101-2021_01_02_425006 261 9 uniform uniform JJ 10_1101-2021_01_02_425006 261 10 distribution distribution NN 10_1101-2021_01_02_425006 261 11 of of IN 10_1101-2021_01_02_425006 261 12 reads read NNS 10_1101-2021_01_02_425006 261 13 over over IN 10_1101-2021_01_02_425006 261 14 different different JJ 10_1101-2021_01_02_425006 261 15 positions position NNS 10_1101-2021_01_02_425006 261 16 of of IN 10_1101-2021_01_02_425006 261 17 a a DT 10_1101-2021_01_02_425006 261 18 transcript transcript NN 10_1101-2021_01_02_425006 261 19 ( ( -LRB- 10_1101-2021_01_02_425006 261 20 33 33 CD 10_1101-2021_01_02_425006 261 21 , , , 10_1101-2021_01_02_425006 261 22 35 35 CD 10_1101-2021_01_02_425006 261 23 ) ) -RRB- 10_1101-2021_01_02_425006 261 24 , , , 10_1101-2021_01_02_425006 261 25 which which WDT 10_1101-2021_01_02_425006 261 26 is be VBZ 10_1101-2021_01_02_425006 261 27 handled handle VBN 10_1101-2021_01_02_425006 261 28 390 390 CD 10_1101-2021_01_02_425006 261 29 by by IN 10_1101-2021_01_02_425006 261 30 learning learn VBG 10_1101-2021_01_02_425006 261 31 non non JJ 10_1101-2021_01_02_425006 261 32 - - JJ 10_1101-2021_01_02_425006 261 33 uniform uniform JJ 10_1101-2021_01_02_425006 261 34 read read VB 10_1101-2021_01_02_425006 261 35 distributions distribution NNS 10_1101-2021_01_02_425006 261 36 from from IN 10_1101-2021_01_02_425006 261 37 given give VBN 10_1101-2021_01_02_425006 261 38 RNA RNA NNP 10_1101-2021_01_02_425006 261 39 - - HYPH 10_1101-2021_01_02_425006 261 40 Seq Seq NNP 10_1101-2021_01_02_425006 261 41 reads read VBZ 10_1101-2021_01_02_425006 261 42 ( ( -LRB- 10_1101-2021_01_02_425006 261 43 32 32 CD 10_1101-2021_01_02_425006 261 44 ) ) -RRB- 10_1101-2021_01_02_425006 261 45 or or CC 10_1101-2021_01_02_425006 261 46 modeling model VBG 10_1101-2021_01_02_425006 261 47 the the DT 10_1101-2021_01_02_425006 261 48 RNA RNA NNP 10_1101-2021_01_02_425006 261 49 391 391 CD 10_1101-2021_01_02_425006 261 50 degradation degradation NN 10_1101-2021_01_02_425006 261 51 ( ( -LRB- 10_1101-2021_01_02_425006 261 52 45 45 CD 10_1101-2021_01_02_425006 261 53 ) ) -RRB- 10_1101-2021_01_02_425006 261 54 . . . 10_1101-2021_01_02_425006 262 1 The the DT 10_1101-2021_01_02_425006 262 2 bias bias NN 10_1101-2021_01_02_425006 262 3 rate rate NN 10_1101-2021_01_02_425006 262 4 function function NN 10_1101-2021_01_02_425006 262 5 we -PRON- PRP 10_1101-2021_01_02_425006 262 6 proposed propose VBD 10_1101-2021_01_02_425006 262 7 can can MD 10_1101-2021_01_02_425006 262 8 address address VB 10_1101-2021_01_02_425006 262 9 the the DT 10_1101-2021_01_02_425006 262 10 non non JJ 10_1101-2021_01_02_425006 262 11 - - JJ 10_1101-2021_01_02_425006 262 12 uniform uniform JJ 10_1101-2021_01_02_425006 262 13 read read VBD 10_1101-2021_01_02_425006 262 14 distribution distribution NN 10_1101-2021_01_02_425006 262 15 392 392 CD 10_1101-2021_01_02_425006 262 16 along along IN 10_1101-2021_01_02_425006 262 17 mRNA mRNA NNS 10_1101-2021_01_02_425006 262 18 transcripts transcript NNS 10_1101-2021_01_02_425006 262 19 and and CC 10_1101-2021_01_02_425006 262 20 also also RB 10_1101-2021_01_02_425006 262 21 be be VB 10_1101-2021_01_02_425006 262 22 desirable desirable JJ 10_1101-2021_01_02_425006 262 23 for for IN 10_1101-2021_01_02_425006 262 24 standard standard JJ 10_1101-2021_01_02_425006 262 25 next next JJ 10_1101-2021_01_02_425006 262 26 - - HYPH 10_1101-2021_01_02_425006 262 27 generation generation NN 10_1101-2021_01_02_425006 262 28 RNA RNA NNP 10_1101-2021_01_02_425006 262 29 - - HYPH 10_1101-2021_01_02_425006 262 30 Seq Seq NNP 10_1101-2021_01_02_425006 262 31 data datum NNS 10_1101-2021_01_02_425006 262 32 that that WDT 10_1101-2021_01_02_425006 262 33 involves involve VBZ 10_1101-2021_01_02_425006 262 34 393 393 CD 10_1101-2021_01_02_425006 262 35 more more RBR 10_1101-2021_01_02_425006 262 36 degraded degraded JJ 10_1101-2021_01_02_425006 262 37 mRNAs mrnas ADD 10_1101-2021_01_02_425006 262 38 , , , 10_1101-2021_01_02_425006 262 39 as as IN 10_1101-2021_01_02_425006 262 40 the the DT 10_1101-2021_01_02_425006 262 41 exponential exponential JJ 10_1101-2021_01_02_425006 262 42 function function NN 10_1101-2021_01_02_425006 262 43 has have VBZ 10_1101-2021_01_02_425006 262 44 been be VBN 10_1101-2021_01_02_425006 262 45 used use VBN 10_1101-2021_01_02_425006 262 46 to to TO 10_1101-2021_01_02_425006 262 47 model model VB 10_1101-2021_01_02_425006 262 48 the the DT 10_1101-2021_01_02_425006 262 49 degradation degradation NN 10_1101-2021_01_02_425006 262 50 of of IN 10_1101-2021_01_02_425006 262 51 mRNA mRNA NNP 10_1101-2021_01_02_425006 262 52 394 394 CD 10_1101-2021_01_02_425006 262 53 transcripts transcript NNS 10_1101-2021_01_02_425006 262 54 ( ( -LRB- 10_1101-2021_01_02_425006 262 55 45 45 CD 10_1101-2021_01_02_425006 262 56 ) ) -RRB- 10_1101-2021_01_02_425006 262 57 . . . 10_1101-2021_01_02_425006 263 1 As as IN 10_1101-2021_01_02_425006 263 2 a a DT 10_1101-2021_01_02_425006 263 3 result result NN 10_1101-2021_01_02_425006 263 4 , , , 10_1101-2021_01_02_425006 263 5 a a DT 10_1101-2021_01_02_425006 263 6 total total NN 10_1101-2021_01_02_425006 263 7 of of IN 10_1101-2021_01_02_425006 263 8 2,973 2,973 CD 10_1101-2021_01_02_425006 263 9 distinct distinct JJ 10_1101-2021_01_02_425006 263 10 ATUs atu NNS 10_1101-2021_01_02_425006 263 11 for for IN 10_1101-2021_01_02_425006 263 12 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 263 13 and and CC 10_1101-2021_01_02_425006 263 14 2,767 2,767 CD 10_1101-2021_01_02_425006 263 15 distinct distinct JJ 10_1101-2021_01_02_425006 263 16 ATUs atu NNS 10_1101-2021_01_02_425006 263 17 395 395 CD 10_1101-2021_01_02_425006 263 18 for for IN 10_1101-2021_01_02_425006 263 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 263 20 were be VBD 10_1101-2021_01_02_425006 263 21 identified identify VBN 10_1101-2021_01_02_425006 263 22 by by IN 10_1101-2021_01_02_425006 263 23 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 263 24 . . . 10_1101-2021_01_02_425006 264 1 The the DT 10_1101-2021_01_02_425006 264 2 precision precision NN 10_1101-2021_01_02_425006 264 3 and and CC 10_1101-2021_01_02_425006 264 4 recall recall NN 10_1101-2021_01_02_425006 264 5 reached reach VBD 10_1101-2021_01_02_425006 264 6 0.67/0.64 0.67/0.64 NNP 10_1101-2021_01_02_425006 264 7 and and CC 10_1101-2021_01_02_425006 264 8 0.67/0.68 0.67/0.68 NN 10_1101-2021_01_02_425006 264 9 , , , 10_1101-2021_01_02_425006 264 10 396 396 CD 10_1101-2021_01_02_425006 264 11 respectively respectively RB 10_1101-2021_01_02_425006 264 12 , , , 10_1101-2021_01_02_425006 264 13 based base VBN 10_1101-2021_01_02_425006 264 14 on on IN 10_1101-2021_01_02_425006 264 15 perfect perfect JJ 10_1101-2021_01_02_425006 264 16 matching matching NN 10_1101-2021_01_02_425006 264 17 and and CC 10_1101-2021_01_02_425006 264 18 0.77/0.74 0.77/0.74 NN 10_1101-2021_01_02_425006 264 19 and and CC 10_1101-2021_01_02_425006 264 20 0.75/0.76 0.75/0.76 NN 10_1101-2021_01_02_425006 264 21 , , , 10_1101-2021_01_02_425006 264 22 respectively respectively RB 10_1101-2021_01_02_425006 264 23 , , , 10_1101-2021_01_02_425006 264 24 based base VBN 10_1101-2021_01_02_425006 264 25 on on IN 10_1101-2021_01_02_425006 264 26 relaxed relaxed JJ 10_1101-2021_01_02_425006 264 27 397 397 CD 10_1101-2021_01_02_425006 264 28 matching match VBG 10_1101-2021_01_02_425006 264 29 for for IN 10_1101-2021_01_02_425006 264 30 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 264 31 / / SYM 10_1101-2021_01_02_425006 264 32 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 264 33 . . . 10_1101-2021_01_02_425006 265 1 We -PRON- PRP 10_1101-2021_01_02_425006 265 2 further further RB 10_1101-2021_01_02_425006 265 3 validated validate VBD 10_1101-2021_01_02_425006 265 4 predicted predict VBD 10_1101-2021_01_02_425006 265 5 ATUs atu NNS 10_1101-2021_01_02_425006 265 6 using use VBG 10_1101-2021_01_02_425006 265 7 experimental experimental JJ 10_1101-2021_01_02_425006 265 8 398 398 CD 10_1101-2021_01_02_425006 265 9 transcription transcription NN 10_1101-2021_01_02_425006 265 10 factor factor NN 10_1101-2021_01_02_425006 265 11 binding bind VBG 10_1101-2021_01_02_425006 265 12 sites site NNS 10_1101-2021_01_02_425006 265 13 or or CC 10_1101-2021_01_02_425006 265 14 transcription transcription NN 10_1101-2021_01_02_425006 265 15 termination termination NN 10_1101-2021_01_02_425006 265 16 sites site NNS 10_1101-2021_01_02_425006 265 17 from from IN 10_1101-2021_01_02_425006 265 18 RegulonDB regulondb NN 10_1101-2021_01_02_425006 265 19 and and CC 10_1101-2021_01_02_425006 265 20 SEnd SEnd NNP 10_1101-2021_01_02_425006 265 21 - - HYPH 10_1101-2021_01_02_425006 265 22 Seq Seq NNP 10_1101-2021_01_02_425006 265 23 . . . 10_1101-2021_01_02_425006 266 1 In in IN 10_1101-2021_01_02_425006 266 2 399 399 CD 10_1101-2021_01_02_425006 266 3 addition addition NN 10_1101-2021_01_02_425006 266 4 , , , 10_1101-2021_01_02_425006 266 5 the the DT 10_1101-2021_01_02_425006 266 6 proportion proportion NN 10_1101-2021_01_02_425006 266 7 of of IN 10_1101-2021_01_02_425006 266 8 the the DT 10_1101-2021_01_02_425006 266 9 5’- 5’- CD 10_1101-2021_01_02_425006 266 10 or or CC 10_1101-2021_01_02_425006 266 11 3’-end 3’-end CD 10_1101-2021_01_02_425006 266 12 genes gene NNS 10_1101-2021_01_02_425006 266 13 of of IN 10_1101-2021_01_02_425006 266 14 predicted predict VBN 10_1101-2021_01_02_425006 266 15 ATUs atu NNS 10_1101-2021_01_02_425006 266 16 that that WDT 10_1101-2021_01_02_425006 266 17 were be VBD 10_1101-2021_01_02_425006 266 18 validated validate VBN 10_1101-2021_01_02_425006 266 19 by by IN 10_1101-2021_01_02_425006 266 20 400 400 CD 10_1101-2021_01_02_425006 266 21 experimental experimental JJ 10_1101-2021_01_02_425006 266 22 transcription transcription NN 10_1101-2021_01_02_425006 266 23 factor factor NN 10_1101-2021_01_02_425006 266 24 binding bind VBG 10_1101-2021_01_02_425006 266 25 sites site NNS 10_1101-2021_01_02_425006 266 26 and and CC 10_1101-2021_01_02_425006 266 27 transcription transcription NN 10_1101-2021_01_02_425006 266 28 termination termination NN 10_1101-2021_01_02_425006 266 29 sites site NNS 10_1101-2021_01_02_425006 266 30 was be VBD 10_1101-2021_01_02_425006 266 31 over over IN 10_1101-2021_01_02_425006 266 32 three three CD 10_1101-2021_01_02_425006 266 33 times time NNS 10_1101-2021_01_02_425006 266 34 401 401 CD 10_1101-2021_01_02_425006 266 35 greater great JJR 10_1101-2021_01_02_425006 266 36 than than IN 10_1101-2021_01_02_425006 266 37 that that DT 10_1101-2021_01_02_425006 266 38 of of IN 10_1101-2021_01_02_425006 266 39 no no DT 10_1101-2021_01_02_425006 266 40 5’- 5’- NN 10_1101-2021_01_02_425006 266 41 or or CC 10_1101-2021_01_02_425006 266 42 3’-end 3’-end CD 10_1101-2021_01_02_425006 266 43 genes gene NNS 10_1101-2021_01_02_425006 266 44 , , , 10_1101-2021_01_02_425006 266 45 demonstrating demonstrate VBG 10_1101-2021_01_02_425006 266 46 the the DT 10_1101-2021_01_02_425006 266 47 high high JJ 10_1101-2021_01_02_425006 266 48 reliability reliability NN 10_1101-2021_01_02_425006 266 49 of of IN 10_1101-2021_01_02_425006 266 50 predicted predict VBN 10_1101-2021_01_02_425006 266 51 ATUs atu NNS 10_1101-2021_01_02_425006 266 52 . . . 10_1101-2021_01_02_425006 267 1 Gene gene NN 10_1101-2021_01_02_425006 267 2 402 402 CD 10_1101-2021_01_02_425006 267 3 pairs pair NNS 10_1101-2021_01_02_425006 267 4 frequently frequently RB 10_1101-2021_01_02_425006 267 5 encoded encode VBN 10_1101-2021_01_02_425006 267 6 in in IN 10_1101-2021_01_02_425006 267 7 the the DT 10_1101-2021_01_02_425006 267 8 same same JJ 10_1101-2021_01_02_425006 267 9 ATUs atu NNS 10_1101-2021_01_02_425006 267 10 were be VBD 10_1101-2021_01_02_425006 267 11 more more RBR 10_1101-2021_01_02_425006 267 12 functionally functionally RB 10_1101-2021_01_02_425006 267 13 related related JJ 10_1101-2021_01_02_425006 267 14 than than IN 10_1101-2021_01_02_425006 267 15 those those DT 10_1101-2021_01_02_425006 267 16 that that WDT 10_1101-2021_01_02_425006 267 17 can can MD 10_1101-2021_01_02_425006 267 18 belong belong VB 10_1101-2021_01_02_425006 267 19 to to IN 10_1101-2021_01_02_425006 267 20 403 403 CD 10_1101-2021_01_02_425006 267 21 two two CD 10_1101-2021_01_02_425006 267 22 distinct distinct JJ 10_1101-2021_01_02_425006 267 23 ATUs atu NNS 10_1101-2021_01_02_425006 267 24 according accord VBG 10_1101-2021_01_02_425006 267 25 to to IN 10_1101-2021_01_02_425006 267 26 GO GO NNP 10_1101-2021_01_02_425006 267 27 and and CC 10_1101-2021_01_02_425006 267 28 KEGG KEGG NNP 10_1101-2021_01_02_425006 267 29 enrichment enrichment NN 10_1101-2021_01_02_425006 267 30 analyses analysis NNS 10_1101-2021_01_02_425006 267 31 . . . 10_1101-2021_01_02_425006 268 1 These these DT 10_1101-2021_01_02_425006 268 2 results result NNS 10_1101-2021_01_02_425006 268 3 demonstrated demonstrate VBD 10_1101-2021_01_02_425006 268 4 the the DT 10_1101-2021_01_02_425006 268 5 404 404 CD 10_1101-2021_01_02_425006 268 6 reliability reliability NN 10_1101-2021_01_02_425006 268 7 and and CC 10_1101-2021_01_02_425006 268 8 accuracy accuracy NN 10_1101-2021_01_02_425006 268 9 of of IN 10_1101-2021_01_02_425006 268 10 our -PRON- PRP$ 10_1101-2021_01_02_425006 268 11 predicted predict VBN 10_1101-2021_01_02_425006 268 12 ATUs atu NNS 10_1101-2021_01_02_425006 268 13 , , , 10_1101-2021_01_02_425006 268 14 implying imply VBG 10_1101-2021_01_02_425006 268 15 the the DT 10_1101-2021_01_02_425006 268 16 ability ability NN 10_1101-2021_01_02_425006 268 17 of of IN 10_1101-2021_01_02_425006 268 18 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 268 19 to to TO 10_1101-2021_01_02_425006 268 20 reveal reveal VB 10_1101-2021_01_02_425006 268 21 the the DT 10_1101-2021_01_02_425006 268 22 405 405 CD 10_1101-2021_01_02_425006 268 23 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 268 24 architecture architecture NN 10_1101-2021_01_02_425006 268 25 of of IN 10_1101-2021_01_02_425006 268 26 the the DT 10_1101-2021_01_02_425006 268 27 bacterial bacterial JJ 10_1101-2021_01_02_425006 268 28 genome genome NN 10_1101-2021_01_02_425006 268 29 . . . 10_1101-2021_01_02_425006 269 1 406 406 CD 10_1101-2021_01_02_425006 269 2 In in IN 10_1101-2021_01_02_425006 269 3 fact fact NN 10_1101-2021_01_02_425006 269 4 , , , 10_1101-2021_01_02_425006 269 5 the the DT 10_1101-2021_01_02_425006 269 6 ATU ATU NNP 10_1101-2021_01_02_425006 269 7 architecture architecture NN 10_1101-2021_01_02_425006 269 8 of of IN 10_1101-2021_01_02_425006 269 9 bacteria bacteria NNS 10_1101-2021_01_02_425006 269 10 is be VBZ 10_1101-2021_01_02_425006 269 11 much much RB 10_1101-2021_01_02_425006 269 12 more more RBR 10_1101-2021_01_02_425006 269 13 complex complex JJ 10_1101-2021_01_02_425006 269 14 than than IN 10_1101-2021_01_02_425006 269 15 that that DT 10_1101-2021_01_02_425006 269 16 determined determine VBN 10_1101-2021_01_02_425006 269 17 with with IN 10_1101-2021_01_02_425006 269 18 currently currently RB 10_1101-2021_01_02_425006 269 19 407 407 CD 10_1101-2021_01_02_425006 269 20 .CC .CC , 10_1101-2021_01_02_425006 269 21 - - HYPH 10_1101-2021_01_02_425006 269 22 BY by IN 10_1101-2021_01_02_425006 269 23 - - HYPH 10_1101-2021_01_02_425006 269 24 NC NC NNP 10_1101-2021_01_02_425006 269 25 - - HYPH 10_1101-2021_01_02_425006 269 26 ND ND NNP 10_1101-2021_01_02_425006 269 27 4.0 4.0 CD 10_1101-2021_01_02_425006 269 28 International International NNP 10_1101-2021_01_02_425006 269 29 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 269 30 under under IN 10_1101-2021_01_02_425006 269 31 a a DT 10_1101-2021_01_02_425006 269 32 ( ( -LRB- 10_1101-2021_01_02_425006 269 33 which which WDT 10_1101-2021_01_02_425006 269 34 was be VBD 10_1101-2021_01_02_425006 269 35 not not RB 10_1101-2021_01_02_425006 269 36 certified certify VBN 10_1101-2021_01_02_425006 269 37 by by IN 10_1101-2021_01_02_425006 269 38 peer peer NN 10_1101-2021_01_02_425006 269 39 review review NN 10_1101-2021_01_02_425006 269 40 ) ) -RRB- 10_1101-2021_01_02_425006 269 41 is be VBZ 10_1101-2021_01_02_425006 269 42 the the DT 10_1101-2021_01_02_425006 269 43 author author NN 10_1101-2021_01_02_425006 269 44 / / SYM 10_1101-2021_01_02_425006 269 45 funder funder NN 10_1101-2021_01_02_425006 269 46 , , , 10_1101-2021_01_02_425006 269 47 who who WP 10_1101-2021_01_02_425006 269 48 has have VBZ 10_1101-2021_01_02_425006 269 49 granted grant VBN 10_1101-2021_01_02_425006 269 50 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 269 51 a a DT 10_1101-2021_01_02_425006 269 52 license license NN 10_1101-2021_01_02_425006 269 53 to to TO 10_1101-2021_01_02_425006 269 54 display display VB 10_1101-2021_01_02_425006 269 55 the the DT 10_1101-2021_01_02_425006 269 56 preprint preprint NN 10_1101-2021_01_02_425006 269 57 in in IN 10_1101-2021_01_02_425006 269 58 perpetuity perpetuity NN 10_1101-2021_01_02_425006 269 59 . . . 10_1101-2021_01_02_425006 270 1 It -PRON- PRP 10_1101-2021_01_02_425006 270 2 is be VBZ 10_1101-2021_01_02_425006 270 3 made make VBN 10_1101-2021_01_02_425006 270 4 The the DT 10_1101-2021_01_02_425006 270 5 copyright copyright NN 10_1101-2021_01_02_425006 270 6 holder holder NN 10_1101-2021_01_02_425006 270 7 for for IN 10_1101-2021_01_02_425006 270 8 this this DT 10_1101-2021_01_02_425006 270 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 270 10 version version NN 10_1101-2021_01_02_425006 270 11 posted post VBD 10_1101-2021_01_02_425006 270 12 January January NNP 10_1101-2021_01_02_425006 270 13 6 6 CD 10_1101-2021_01_02_425006 270 14 , , , 10_1101-2021_01_02_425006 270 15 2021 2021 CD 10_1101-2021_01_02_425006 270 16 . . . 10_1101-2021_01_02_425006 270 17 ; ; : 10_1101-2021_01_02_425006 270 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 270 19 : : : 10_1101-2021_01_02_425006 270 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 270 21 preprint preprint NN 10_1101-2021_01_02_425006 270 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 270 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 270 24 24 24 CD 10_1101-2021_01_02_425006 270 25 used use VBD 10_1101-2021_01_02_425006 270 26 experimental experimental JJ 10_1101-2021_01_02_425006 270 27 techniques technique NNS 10_1101-2021_01_02_425006 270 28 . . . 10_1101-2021_01_02_425006 271 1 We -PRON- PRP 10_1101-2021_01_02_425006 271 2 investigated investigate VBD 10_1101-2021_01_02_425006 271 3 the the DT 10_1101-2021_01_02_425006 271 4 5’-end 5’-end CD 10_1101-2021_01_02_425006 271 5 genes gene NNS 10_1101-2021_01_02_425006 271 6 and and CC 10_1101-2021_01_02_425006 271 7 no no DT 10_1101-2021_01_02_425006 271 8 5’-end 5’-end CD 10_1101-2021_01_02_425006 271 9 genes gene NNS 10_1101-2021_01_02_425006 271 10 of of IN 10_1101-2021_01_02_425006 271 11 the the DT 10_1101-2021_01_02_425006 271 12 experimental experimental JJ 10_1101-2021_01_02_425006 271 13 408 408 CD 10_1101-2021_01_02_425006 271 14 ATUs atu NNS 10_1101-2021_01_02_425006 271 15 identified identify VBN 10_1101-2021_01_02_425006 271 16 by by IN 10_1101-2021_01_02_425006 271 17 SMRT SMRT NNP 10_1101-2021_01_02_425006 271 18 - - HYPH 10_1101-2021_01_02_425006 271 19 Cappable cappable JJ 10_1101-2021_01_02_425006 271 20 - - HYPH 10_1101-2021_01_02_425006 271 21 seq seq NN 10_1101-2021_01_02_425006 271 22 ( ( -LRB- 10_1101-2021_01_02_425006 271 23 6 6 CD 10_1101-2021_01_02_425006 271 24 ) ) -RRB- 10_1101-2021_01_02_425006 271 25 using use VBG 10_1101-2021_01_02_425006 271 26 a a DT 10_1101-2021_01_02_425006 271 27 combination combination NN 10_1101-2021_01_02_425006 271 28 of of IN 10_1101-2021_01_02_425006 271 29 experimental experimental JJ 10_1101-2021_01_02_425006 271 30 TSSs tss NNS 10_1101-2021_01_02_425006 271 31 from from IN 10_1101-2021_01_02_425006 271 32 409 409 CD 10_1101-2021_01_02_425006 271 33 RegulonDB regulondb NN 10_1101-2021_01_02_425006 271 34 ( ( -LRB- 10_1101-2021_01_02_425006 271 35 19 19 CD 10_1101-2021_01_02_425006 271 36 ) ) -RRB- 10_1101-2021_01_02_425006 271 37 , , , 10_1101-2021_01_02_425006 271 38 dRNA drna NN 10_1101-2021_01_02_425006 271 39 - - HYPH 10_1101-2021_01_02_425006 271 40 seq seq NN 10_1101-2021_01_02_425006 271 41 ( ( -LRB- 10_1101-2021_01_02_425006 271 42 14 14 CD 10_1101-2021_01_02_425006 271 43 ) ) -RRB- 10_1101-2021_01_02_425006 271 44 , , , 10_1101-2021_01_02_425006 271 45 Cappable cappable JJ 10_1101-2021_01_02_425006 271 46 - - HYPH 10_1101-2021_01_02_425006 271 47 seq seq NN 10_1101-2021_01_02_425006 271 48 ( ( -LRB- 10_1101-2021_01_02_425006 271 49 13 13 CD 10_1101-2021_01_02_425006 271 50 ) ) -RRB- 10_1101-2021_01_02_425006 271 51 , , , 10_1101-2021_01_02_425006 271 52 and and CC 10_1101-2021_01_02_425006 271 53 SEnd SEnd NNP 10_1101-2021_01_02_425006 271 54 - - HYPH 10_1101-2021_01_02_425006 271 55 seq seq NNP 10_1101-2021_01_02_425006 271 56 ( ( -LRB- 10_1101-2021_01_02_425006 271 57 7 7 CD 10_1101-2021_01_02_425006 271 58 ) ) -RRB- 10_1101-2021_01_02_425006 271 59 . . . 10_1101-2021_01_02_425006 272 1 As as IN 10_1101-2021_01_02_425006 272 2 a a DT 10_1101-2021_01_02_425006 272 3 result result NN 10_1101-2021_01_02_425006 272 4 , , , 10_1101-2021_01_02_425006 272 5 we -PRON- PRP 10_1101-2021_01_02_425006 272 6 found find VBD 10_1101-2021_01_02_425006 272 7 that that IN 10_1101-2021_01_02_425006 272 8 the the DT 10_1101-2021_01_02_425006 272 9 410 410 CD 10_1101-2021_01_02_425006 272 10 proportion proportion NN 10_1101-2021_01_02_425006 272 11 of of IN 10_1101-2021_01_02_425006 272 12 5’-end 5’-end CD 10_1101-2021_01_02_425006 272 13 genes gene NNS 10_1101-2021_01_02_425006 272 14 ( ( -LRB- 10_1101-2021_01_02_425006 272 15 99 99 CD 10_1101-2021_01_02_425006 272 16 % % NN 10_1101-2021_01_02_425006 272 17 ) ) -RRB- 10_1101-2021_01_02_425006 272 18 validated validate VBN 10_1101-2021_01_02_425006 272 19 by by IN 10_1101-2021_01_02_425006 272 20 experimental experimental JJ 10_1101-2021_01_02_425006 272 21 TSSs tss NNS 10_1101-2021_01_02_425006 272 22 was be VBD 10_1101-2021_01_02_425006 272 23 not not RB 10_1101-2021_01_02_425006 272 24 significantly significantly RB 10_1101-2021_01_02_425006 272 25 different different JJ 10_1101-2021_01_02_425006 272 26 from from IN 10_1101-2021_01_02_425006 272 27 411 411 CD 10_1101-2021_01_02_425006 272 28 that that DT 10_1101-2021_01_02_425006 272 29 of of IN 10_1101-2021_01_02_425006 272 30 no no DT 10_1101-2021_01_02_425006 272 31 5’-end 5’-end CD 10_1101-2021_01_02_425006 272 32 genes gene NNS 10_1101-2021_01_02_425006 272 33 ( ( -LRB- 10_1101-2021_01_02_425006 272 34 92 92 CD 10_1101-2021_01_02_425006 272 35 % % NN 10_1101-2021_01_02_425006 272 36 ) ) -RRB- 10_1101-2021_01_02_425006 272 37 . . . 10_1101-2021_01_02_425006 273 1 The the DT 10_1101-2021_01_02_425006 273 2 high high JJ 10_1101-2021_01_02_425006 273 3 percentage percentage NN 10_1101-2021_01_02_425006 273 4 of of IN 10_1101-2021_01_02_425006 273 5 no no DT 10_1101-2021_01_02_425006 273 6 5’-end 5’-end CD 10_1101-2021_01_02_425006 273 7 genes gene NNS 10_1101-2021_01_02_425006 273 8 validated validate VBN 10_1101-2021_01_02_425006 273 9 by by IN 10_1101-2021_01_02_425006 273 10 experimental experimental JJ 10_1101-2021_01_02_425006 273 11 TSSs tss NNS 10_1101-2021_01_02_425006 273 12 412 412 CD 10_1101-2021_01_02_425006 273 13 implied imply VBD 10_1101-2021_01_02_425006 273 14 that that IN 10_1101-2021_01_02_425006 273 15 the the DT 10_1101-2021_01_02_425006 273 16 ATUs atu NNS 10_1101-2021_01_02_425006 273 17 identified identify VBN 10_1101-2021_01_02_425006 273 18 by by IN 10_1101-2021_01_02_425006 273 19 experimental experimental JJ 10_1101-2021_01_02_425006 273 20 techniques technique NNS 10_1101-2021_01_02_425006 273 21 are be VBP 10_1101-2021_01_02_425006 273 22 only only RB 10_1101-2021_01_02_425006 273 23 a a DT 10_1101-2021_01_02_425006 273 24 small small JJ 10_1101-2021_01_02_425006 273 25 proportion proportion NN 10_1101-2021_01_02_425006 273 26 of of IN 10_1101-2021_01_02_425006 273 27 the the DT 10_1101-2021_01_02_425006 273 28 413 413 CD 10_1101-2021_01_02_425006 273 29 comprehensive comprehensive JJ 10_1101-2021_01_02_425006 273 30 ATUs atu NNS 10_1101-2021_01_02_425006 273 31 in in IN 10_1101-2021_01_02_425006 273 32 bacterial bacterial JJ 10_1101-2021_01_02_425006 273 33 organisms organism NNS 10_1101-2021_01_02_425006 273 34 due due JJ 10_1101-2021_01_02_425006 273 35 to to IN 10_1101-2021_01_02_425006 273 36 the the DT 10_1101-2021_01_02_425006 273 37 dynamic dynamic JJ 10_1101-2021_01_02_425006 273 38 mechanisms mechanism NNS 10_1101-2021_01_02_425006 273 39 of of IN 10_1101-2021_01_02_425006 273 40 ATUs atu NNS 10_1101-2021_01_02_425006 273 41 . . . 10_1101-2021_01_02_425006 274 1 These these DT 10_1101-2021_01_02_425006 274 2 results result NNS 10_1101-2021_01_02_425006 274 3 414 414 CD 10_1101-2021_01_02_425006 274 4 further further RB 10_1101-2021_01_02_425006 274 5 verified verify VBD 10_1101-2021_01_02_425006 274 6 the the DT 10_1101-2021_01_02_425006 274 7 necessity necessity NN 10_1101-2021_01_02_425006 274 8 of of IN 10_1101-2021_01_02_425006 274 9 developing develop VBG 10_1101-2021_01_02_425006 274 10 robust robust JJ 10_1101-2021_01_02_425006 274 11 computational computational JJ 10_1101-2021_01_02_425006 274 12 methods method NNS 10_1101-2021_01_02_425006 274 13 for for IN 10_1101-2021_01_02_425006 274 14 ATU ATU NNP 10_1101-2021_01_02_425006 274 15 identification identification NN 10_1101-2021_01_02_425006 274 16 . . . 10_1101-2021_01_02_425006 275 1 415 415 CD 10_1101-2021_01_02_425006 275 2 SeqATU seqatu IN 10_1101-2021_01_02_425006 275 3 not not RB 10_1101-2021_01_02_425006 275 4 only only RB 10_1101-2021_01_02_425006 275 5 provides provide VBZ 10_1101-2021_01_02_425006 275 6 a a DT 10_1101-2021_01_02_425006 275 7 powerful powerful JJ 10_1101-2021_01_02_425006 275 8 tool tool NN 10_1101-2021_01_02_425006 275 9 to to TO 10_1101-2021_01_02_425006 275 10 understand understand VB 10_1101-2021_01_02_425006 275 11 the the DT 10_1101-2021_01_02_425006 275 12 transcription transcription NN 10_1101-2021_01_02_425006 275 13 mechanism mechanism NN 10_1101-2021_01_02_425006 275 14 of of IN 10_1101-2021_01_02_425006 275 15 bacteria bacteria NNS 10_1101-2021_01_02_425006 275 16 but but CC 10_1101-2021_01_02_425006 275 17 416 416 CD 10_1101-2021_01_02_425006 275 18 also also RB 10_1101-2021_01_02_425006 275 19 provides provide VBZ 10_1101-2021_01_02_425006 275 20 a a DT 10_1101-2021_01_02_425006 275 21 fundamental fundamental JJ 10_1101-2021_01_02_425006 275 22 tool tool NN 10_1101-2021_01_02_425006 275 23 to to TO 10_1101-2021_01_02_425006 275 24 guide guide VB 10_1101-2021_01_02_425006 275 25 the the DT 10_1101-2021_01_02_425006 275 26 reconstruction reconstruction NN 10_1101-2021_01_02_425006 275 27 of of IN 10_1101-2021_01_02_425006 275 28 a a DT 10_1101-2021_01_02_425006 275 29 genome genome JJ 10_1101-2021_01_02_425006 275 30 - - HYPH 10_1101-2021_01_02_425006 275 31 scale scale NN 10_1101-2021_01_02_425006 275 32 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 275 33 regulatory regulatory JJ 10_1101-2021_01_02_425006 275 34 417 417 CD 10_1101-2021_01_02_425006 275 35 network network NN 10_1101-2021_01_02_425006 275 36 . . . 10_1101-2021_01_02_425006 276 1 First first RB 10_1101-2021_01_02_425006 276 2 , , , 10_1101-2021_01_02_425006 276 3 the the DT 10_1101-2021_01_02_425006 276 4 ATU ATU NNP 10_1101-2021_01_02_425006 276 5 structure structure NN 10_1101-2021_01_02_425006 276 6 can can MD 10_1101-2021_01_02_425006 276 7 help help VB 10_1101-2021_01_02_425006 276 8 us -PRON- PRP 10_1101-2021_01_02_425006 276 9 to to TO 10_1101-2021_01_02_425006 276 10 make make VB 10_1101-2021_01_02_425006 276 11 new new JJ 10_1101-2021_01_02_425006 276 12 functional functional JJ 10_1101-2021_01_02_425006 276 13 predictions prediction NNS 10_1101-2021_01_02_425006 276 14 , , , 10_1101-2021_01_02_425006 276 15 as as IN 10_1101-2021_01_02_425006 276 16 genes gene NNS 10_1101-2021_01_02_425006 276 17 in in IN 10_1101-2021_01_02_425006 276 18 an an DT 10_1101-2021_01_02_425006 276 19 ATU ATU NNP 10_1101-2021_01_02_425006 276 20 418 418 CD 10_1101-2021_01_02_425006 276 21 tend tend VBP 10_1101-2021_01_02_425006 276 22 to to TO 10_1101-2021_01_02_425006 276 23 have have VB 10_1101-2021_01_02_425006 276 24 related relate VBN 10_1101-2021_01_02_425006 276 25 functions function NNS 10_1101-2021_01_02_425006 276 26 . . . 10_1101-2021_01_02_425006 277 1 Second second JJ 10_1101-2021_01_02_425006 277 2 , , , 10_1101-2021_01_02_425006 277 3 ATUs atu NNS 10_1101-2021_01_02_425006 277 4 can can MD 10_1101-2021_01_02_425006 277 5 elucidate elucidate VB 10_1101-2021_01_02_425006 277 6 condition condition NN 10_1101-2021_01_02_425006 277 7 - - HYPH 10_1101-2021_01_02_425006 277 8 specific specific JJ 10_1101-2021_01_02_425006 277 9 uses use NNS 10_1101-2021_01_02_425006 277 10 of of IN 10_1101-2021_01_02_425006 277 11 alternative alternative JJ 10_1101-2021_01_02_425006 277 12 sigma sigma NNP 10_1101-2021_01_02_425006 277 13 419 419 CD 10_1101-2021_01_02_425006 277 14 factors factor NNS 10_1101-2021_01_02_425006 277 15 ( ( -LRB- 10_1101-2021_01_02_425006 277 16 8 8 CD 10_1101-2021_01_02_425006 277 17 , , , 10_1101-2021_01_02_425006 277 18 46 46 CD 10_1101-2021_01_02_425006 277 19 ) ) -RRB- 10_1101-2021_01_02_425006 277 20 . . . 10_1101-2021_01_02_425006 278 1 For for IN 10_1101-2021_01_02_425006 278 2 example example NN 10_1101-2021_01_02_425006 278 3 , , , 10_1101-2021_01_02_425006 278 4 the the DT 10_1101-2021_01_02_425006 278 5 thrLABC thrLABC NNP 10_1101-2021_01_02_425006 278 6 operon operon NN 10_1101-2021_01_02_425006 278 7 is be VBZ 10_1101-2021_01_02_425006 278 8 regulated regulate VBN 10_1101-2021_01_02_425006 278 9 by by IN 10_1101-2021_01_02_425006 278 10 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 278 11 attenuation attenuation NN 10_1101-2021_01_02_425006 278 12 . . . 10_1101-2021_01_02_425006 279 1 Totsuka Totsuka NNP 10_1101-2021_01_02_425006 279 2 et et NNP 10_1101-2021_01_02_425006 279 3 420 420 CD 10_1101-2021_01_02_425006 279 4 al al NNP 10_1101-2021_01_02_425006 279 5 . . . 10_1101-2021_01_02_425006 280 1 found find VBD 10_1101-2021_01_02_425006 280 2 that that IN 10_1101-2021_01_02_425006 280 3 under under IN 10_1101-2021_01_02_425006 280 4 the the DT 10_1101-2021_01_02_425006 280 5 log log NN 10_1101-2021_01_02_425006 280 6 phase phase NN 10_1101-2021_01_02_425006 280 7 growth growth NN 10_1101-2021_01_02_425006 280 8 condition condition NN 10_1101-2021_01_02_425006 280 9 , , , 10_1101-2021_01_02_425006 280 10 the the DT 10_1101-2021_01_02_425006 280 11 thrLABC thrLABC NNP 10_1101-2021_01_02_425006 280 12 operon operon NN 10_1101-2021_01_02_425006 280 13 is be VBZ 10_1101-2021_01_02_425006 280 14 the the DT 10_1101-2021_01_02_425006 280 15 only only JJ 10_1101-2021_01_02_425006 280 16 transcript transcript NN 10_1101-2021_01_02_425006 280 17 , , , 10_1101-2021_01_02_425006 280 18 while while IN 10_1101-2021_01_02_425006 280 19 421 421 CD 10_1101-2021_01_02_425006 280 20 two two CD 10_1101-2021_01_02_425006 280 21 transcripts transcript NNS 10_1101-2021_01_02_425006 280 22 are be VBP 10_1101-2021_01_02_425006 280 23 found find VBN 10_1101-2021_01_02_425006 280 24 under under IN 10_1101-2021_01_02_425006 280 25 stationary stationary JJ 10_1101-2021_01_02_425006 280 26 phase phase NN 10_1101-2021_01_02_425006 280 27 growth growth NN 10_1101-2021_01_02_425006 280 28 condition condition NN 10_1101-2021_01_02_425006 280 29 , , , 10_1101-2021_01_02_425006 280 30 the the DT 10_1101-2021_01_02_425006 280 31 thrLABC thrLABC NNP 10_1101-2021_01_02_425006 280 32 and and CC 10_1101-2021_01_02_425006 280 33 thrBC thrbc ADD 10_1101-2021_01_02_425006 280 34 . . . 10_1101-2021_01_02_425006 281 1 As as IN 10_1101-2021_01_02_425006 281 2 validated validate VBN 10_1101-2021_01_02_425006 281 3 422 422 CD 10_1101-2021_01_02_425006 281 4 experimentally experimentally RB 10_1101-2021_01_02_425006 281 5 , , , 10_1101-2021_01_02_425006 281 6 � � ADD 10_1101-2021_01_02_425006 281 7 � � NNP 10_1101-2021_01_02_425006 281 8 can can MD 10_1101-2021_01_02_425006 281 9 regulate regulate VB 10_1101-2021_01_02_425006 281 10 the the DT 10_1101-2021_01_02_425006 281 11 additional additional JJ 10_1101-2021_01_02_425006 281 12 promoter promoter NN 10_1101-2021_01_02_425006 281 13 located locate VBN 10_1101-2021_01_02_425006 281 14 in in IN 10_1101-2021_01_02_425006 281 15 front front NN 10_1101-2021_01_02_425006 281 16 of of IN 10_1101-2021_01_02_425006 281 17 thrB thrb CD 10_1101-2021_01_02_425006 281 18 under under IN 10_1101-2021_01_02_425006 281 19 the the DT 10_1101-2021_01_02_425006 281 20 stationary stationary JJ 10_1101-2021_01_02_425006 281 21 423 423 CD 10_1101-2021_01_02_425006 281 22 phase phase NN 10_1101-2021_01_02_425006 281 23 growth growth NN 10_1101-2021_01_02_425006 281 24 condition condition NN 10_1101-2021_01_02_425006 281 25 and and CC 10_1101-2021_01_02_425006 281 26 then then RB 10_1101-2021_01_02_425006 281 27 separately separately RB 10_1101-2021_01_02_425006 281 28 regulate regulate VB 10_1101-2021_01_02_425006 281 29 thrBC thrbc ADD 10_1101-2021_01_02_425006 281 30 , , , 10_1101-2021_01_02_425006 281 31 which which WDT 10_1101-2021_01_02_425006 281 32 elucidates elucidate VBZ 10_1101-2021_01_02_425006 281 33 the the DT 10_1101-2021_01_02_425006 281 34 condition condition NN 10_1101-2021_01_02_425006 281 35 - - HYPH 10_1101-2021_01_02_425006 281 36 specific specific JJ 10_1101-2021_01_02_425006 281 37 uses use VBZ 10_1101-2021_01_02_425006 281 38 424 424 CD 10_1101-2021_01_02_425006 281 39 of of IN 10_1101-2021_01_02_425006 281 40 � � NNP 10_1101-2021_01_02_425006 281 41 � � NNP 10_1101-2021_01_02_425006 281 42 ( ( -LRB- 10_1101-2021_01_02_425006 281 43 8) 8) CD 10_1101-2021_01_02_425006 281 44 . . . 10_1101-2021_01_02_425006 282 1 Third third JJ 10_1101-2021_01_02_425006 282 2 , , , 10_1101-2021_01_02_425006 282 3 understanding understand VBG 10_1101-2021_01_02_425006 282 4 the the DT 10_1101-2021_01_02_425006 282 5 ATU ATU NNP 10_1101-2021_01_02_425006 282 6 structure structure NN 10_1101-2021_01_02_425006 282 7 is be VBZ 10_1101-2021_01_02_425006 282 8 of of IN 10_1101-2021_01_02_425006 282 9 great great JJ 10_1101-2021_01_02_425006 282 10 help help NN 10_1101-2021_01_02_425006 282 11 to to TO 10_1101-2021_01_02_425006 282 12 construct construct VB 10_1101-2021_01_02_425006 282 13 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 282 14 and and CC 10_1101-2021_01_02_425006 282 15 425 425 CD 10_1101-2021_01_02_425006 282 16 translation translation NN 10_1101-2021_01_02_425006 282 17 regulatory regulatory JJ 10_1101-2021_01_02_425006 282 18 networks network NNS 10_1101-2021_01_02_425006 282 19 , , , 10_1101-2021_01_02_425006 282 20 such such JJ 10_1101-2021_01_02_425006 282 21 as as IN 10_1101-2021_01_02_425006 282 22 for for IN 10_1101-2021_01_02_425006 282 23 the the DT 10_1101-2021_01_02_425006 282 24 construction construction NN 10_1101-2021_01_02_425006 282 25 of of IN 10_1101-2021_01_02_425006 282 26 the the DT 10_1101-2021_01_02_425006 282 27 σ σ NNP 10_1101-2021_01_02_425006 282 28 - - HYPH 10_1101-2021_01_02_425006 282 29 TUG TUG NNP 10_1101-2021_01_02_425006 282 30 ( ( -LRB- 10_1101-2021_01_02_425006 282 31 σ σ NN 10_1101-2021_01_02_425006 282 32 - - HYPH 10_1101-2021_01_02_425006 282 33 factor factor NN 10_1101-2021_01_02_425006 282 34 - - HYPH 10_1101-2021_01_02_425006 282 35 transcription transcription NN 10_1101-2021_01_02_425006 282 36 unit unit NN 10_1101-2021_01_02_425006 282 37 426 426 CD 10_1101-2021_01_02_425006 282 38 .CC .CC : 10_1101-2021_01_02_425006 282 39 - - HYPH 10_1101-2021_01_02_425006 282 40 BY by IN 10_1101-2021_01_02_425006 282 41 - - HYPH 10_1101-2021_01_02_425006 282 42 NC NC NNP 10_1101-2021_01_02_425006 282 43 - - HYPH 10_1101-2021_01_02_425006 282 44 ND ND NNP 10_1101-2021_01_02_425006 282 45 4.0 4.0 CD 10_1101-2021_01_02_425006 282 46 International International NNP 10_1101-2021_01_02_425006 282 47 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 282 48 under under IN 10_1101-2021_01_02_425006 282 49 a a DT 10_1101-2021_01_02_425006 282 50 ( ( -LRB- 10_1101-2021_01_02_425006 282 51 which which WDT 10_1101-2021_01_02_425006 282 52 was be VBD 10_1101-2021_01_02_425006 282 53 not not RB 10_1101-2021_01_02_425006 282 54 certified certify VBN 10_1101-2021_01_02_425006 282 55 by by IN 10_1101-2021_01_02_425006 282 56 peer peer NN 10_1101-2021_01_02_425006 282 57 review review NN 10_1101-2021_01_02_425006 282 58 ) ) -RRB- 10_1101-2021_01_02_425006 282 59 is be VBZ 10_1101-2021_01_02_425006 282 60 the the DT 10_1101-2021_01_02_425006 282 61 author author NN 10_1101-2021_01_02_425006 282 62 / / SYM 10_1101-2021_01_02_425006 282 63 funder funder NN 10_1101-2021_01_02_425006 282 64 , , , 10_1101-2021_01_02_425006 282 65 who who WP 10_1101-2021_01_02_425006 282 66 has have VBZ 10_1101-2021_01_02_425006 282 67 granted grant VBN 10_1101-2021_01_02_425006 282 68 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 282 69 a a DT 10_1101-2021_01_02_425006 282 70 license license NN 10_1101-2021_01_02_425006 282 71 to to TO 10_1101-2021_01_02_425006 282 72 display display VB 10_1101-2021_01_02_425006 282 73 the the DT 10_1101-2021_01_02_425006 282 74 preprint preprint NN 10_1101-2021_01_02_425006 282 75 in in IN 10_1101-2021_01_02_425006 282 76 perpetuity perpetuity NN 10_1101-2021_01_02_425006 282 77 . . . 10_1101-2021_01_02_425006 283 1 It -PRON- PRP 10_1101-2021_01_02_425006 283 2 is be VBZ 10_1101-2021_01_02_425006 283 3 made make VBN 10_1101-2021_01_02_425006 283 4 The the DT 10_1101-2021_01_02_425006 283 5 copyright copyright NN 10_1101-2021_01_02_425006 283 6 holder holder NN 10_1101-2021_01_02_425006 283 7 for for IN 10_1101-2021_01_02_425006 283 8 this this DT 10_1101-2021_01_02_425006 283 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 283 10 version version NN 10_1101-2021_01_02_425006 283 11 posted post VBD 10_1101-2021_01_02_425006 283 12 January January NNP 10_1101-2021_01_02_425006 283 13 6 6 CD 10_1101-2021_01_02_425006 283 14 , , , 10_1101-2021_01_02_425006 283 15 2021 2021 CD 10_1101-2021_01_02_425006 283 16 . . . 10_1101-2021_01_02_425006 283 17 ; ; : 10_1101-2021_01_02_425006 283 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 283 19 : : : 10_1101-2021_01_02_425006 283 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 283 21 preprint preprint NN 10_1101-2021_01_02_425006 283 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 283 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 283 24 25 25 CD 10_1101-2021_01_02_425006 283 25 gene gene NN 10_1101-2021_01_02_425006 283 26 ) ) -RRB- 10_1101-2021_01_02_425006 283 27 network network NN 10_1101-2021_01_02_425006 283 28 ( ( -LRB- 10_1101-2021_01_02_425006 283 29 47 47 CD 10_1101-2021_01_02_425006 283 30 ) ) -RRB- 10_1101-2021_01_02_425006 283 31 . . . 10_1101-2021_01_02_425006 284 1 The the DT 10_1101-2021_01_02_425006 284 2 transcription transcription NN 10_1101-2021_01_02_425006 284 3 regulatory regulatory JJ 10_1101-2021_01_02_425006 284 4 network network NN 10_1101-2021_01_02_425006 284 5 consists consist VBZ 10_1101-2021_01_02_425006 284 6 of of IN 10_1101-2021_01_02_425006 284 7 nodes node NNS 10_1101-2021_01_02_425006 284 8 ( ( -LRB- 10_1101-2021_01_02_425006 284 9 ATU ATU NNP 10_1101-2021_01_02_425006 284 10 and and CC 10_1101-2021_01_02_425006 284 11 regulatory regulatory JJ 10_1101-2021_01_02_425006 284 12 427 427 CD 10_1101-2021_01_02_425006 284 13 proteins protein NNS 10_1101-2021_01_02_425006 284 14 ) ) -RRB- 10_1101-2021_01_02_425006 284 15 and and CC 10_1101-2021_01_02_425006 284 16 links link NNS 10_1101-2021_01_02_425006 284 17 ( ( -LRB- 10_1101-2021_01_02_425006 284 18 interactions interaction NNS 10_1101-2021_01_02_425006 284 19 ) ) -RRB- 10_1101-2021_01_02_425006 284 20 ( ( -LRB- 10_1101-2021_01_02_425006 284 21 48 48 CD 10_1101-2021_01_02_425006 284 22 ) ) -RRB- 10_1101-2021_01_02_425006 284 23 , , , 10_1101-2021_01_02_425006 284 24 and and CC 10_1101-2021_01_02_425006 284 25 the the DT 10_1101-2021_01_02_425006 284 26 comprehensive comprehensive JJ 10_1101-2021_01_02_425006 284 27 ATU ATU NNP 10_1101-2021_01_02_425006 284 28 structure structure NN 10_1101-2021_01_02_425006 284 29 can can MD 10_1101-2021_01_02_425006 284 30 provide provide VB 10_1101-2021_01_02_425006 284 31 a a DT 10_1101-2021_01_02_425006 284 32 nearly nearly RB 10_1101-2021_01_02_425006 284 33 428 428 CD 10_1101-2021_01_02_425006 284 34 complete complete JJ 10_1101-2021_01_02_425006 284 35 set set NN 10_1101-2021_01_02_425006 284 36 of of IN 10_1101-2021_01_02_425006 284 37 nodes node NNS 10_1101-2021_01_02_425006 284 38 , , , 10_1101-2021_01_02_425006 284 39 which which WDT 10_1101-2021_01_02_425006 284 40 can can MD 10_1101-2021_01_02_425006 284 41 improve improve VB 10_1101-2021_01_02_425006 284 42 the the DT 10_1101-2021_01_02_425006 284 43 accuracy accuracy NN 10_1101-2021_01_02_425006 284 44 of of IN 10_1101-2021_01_02_425006 284 45 regulatory regulatory JJ 10_1101-2021_01_02_425006 284 46 prediction prediction NN 10_1101-2021_01_02_425006 284 47 . . . 10_1101-2021_01_02_425006 285 1 429 429 CD 10_1101-2021_01_02_425006 285 2 Although although IN 10_1101-2021_01_02_425006 285 3 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 285 4 has have VBZ 10_1101-2021_01_02_425006 285 5 obtained obtain VBN 10_1101-2021_01_02_425006 285 6 satisfactory satisfactory JJ 10_1101-2021_01_02_425006 285 7 predicted predict VBN 10_1101-2021_01_02_425006 285 8 results result NNS 10_1101-2021_01_02_425006 285 9 , , , 10_1101-2021_01_02_425006 285 10 there there EX 10_1101-2021_01_02_425006 285 11 are be VBP 10_1101-2021_01_02_425006 285 12 still still RB 10_1101-2021_01_02_425006 285 13 several several JJ 10_1101-2021_01_02_425006 285 14 challenges challenge NNS 10_1101-2021_01_02_425006 285 15 430 430 CD 10_1101-2021_01_02_425006 285 16 regarding regard VBG 10_1101-2021_01_02_425006 285 17 the the DT 10_1101-2021_01_02_425006 285 18 computational computational JJ 10_1101-2021_01_02_425006 285 19 prediction prediction NN 10_1101-2021_01_02_425006 285 20 of of IN 10_1101-2021_01_02_425006 285 21 ATUs atu NNS 10_1101-2021_01_02_425006 285 22 . . . 10_1101-2021_01_02_425006 286 1 On on IN 10_1101-2021_01_02_425006 286 2 the the DT 10_1101-2021_01_02_425006 286 3 one one CD 10_1101-2021_01_02_425006 286 4 hand hand NN 10_1101-2021_01_02_425006 286 5 , , , 10_1101-2021_01_02_425006 286 6 due due IN 10_1101-2021_01_02_425006 286 7 to to IN 10_1101-2021_01_02_425006 286 8 the the DT 10_1101-2021_01_02_425006 286 9 influence influence NN 10_1101-2021_01_02_425006 286 10 of of IN 10_1101-2021_01_02_425006 286 11 the the DT 10_1101-2021_01_02_425006 286 12 3 3 CD 10_1101-2021_01_02_425006 286 13 ’ ' '' 10_1101-2021_01_02_425006 286 14 431 431 CD 10_1101-2021_01_02_425006 286 15 untranslated untranslated JJ 10_1101-2021_01_02_425006 286 16 region region NN 10_1101-2021_01_02_425006 286 17 ( ( -LRB- 10_1101-2021_01_02_425006 286 18 UTR UTR NNP 10_1101-2021_01_02_425006 286 19 ) ) -RRB- 10_1101-2021_01_02_425006 286 20 and and CC 10_1101-2021_01_02_425006 286 21 5 5 CD 10_1101-2021_01_02_425006 286 22 ’ ' '' 10_1101-2021_01_02_425006 286 23 untranslated untranslated JJ 10_1101-2021_01_02_425006 286 24 region region NN 10_1101-2021_01_02_425006 286 25 ( ( -LRB- 10_1101-2021_01_02_425006 286 26 UTR UTR NNP 10_1101-2021_01_02_425006 286 27 ) ) -RRB- 10_1101-2021_01_02_425006 286 28 in in IN 10_1101-2021_01_02_425006 286 29 the the DT 10_1101-2021_01_02_425006 286 30 intergenic intergenic JJ 10_1101-2021_01_02_425006 286 31 regions region NNS 10_1101-2021_01_02_425006 286 32 , , , 10_1101-2021_01_02_425006 286 33 the the DT 10_1101-2021_01_02_425006 286 34 expression expression NN 10_1101-2021_01_02_425006 286 35 432 432 CD 10_1101-2021_01_02_425006 286 36 value value NN 10_1101-2021_01_02_425006 286 37 of of IN 10_1101-2021_01_02_425006 286 38 intergenic intergenic JJ 10_1101-2021_01_02_425006 286 39 regions region NNS 10_1101-2021_01_02_425006 286 40 can can MD 10_1101-2021_01_02_425006 286 41 not not RB 10_1101-2021_01_02_425006 286 42 be be VB 10_1101-2021_01_02_425006 286 43 reproduced reproduce VBN 10_1101-2021_01_02_425006 286 44 perfectly perfectly RB 10_1101-2021_01_02_425006 286 45 by by IN 10_1101-2021_01_02_425006 286 46 the the DT 10_1101-2021_01_02_425006 286 47 same same JJ 10_1101-2021_01_02_425006 286 48 calculation calculation NN 10_1101-2021_01_02_425006 286 49 used use VBN 10_1101-2021_01_02_425006 286 50 for for IN 10_1101-2021_01_02_425006 286 51 the the DT 10_1101-2021_01_02_425006 286 52 433 433 CD 10_1101-2021_01_02_425006 286 53 expression expression NN 10_1101-2021_01_02_425006 286 54 value value NN 10_1101-2021_01_02_425006 286 55 of of IN 10_1101-2021_01_02_425006 286 56 genetic genetic JJ 10_1101-2021_01_02_425006 286 57 regions region NNS 10_1101-2021_01_02_425006 286 58 . . . 10_1101-2021_01_02_425006 287 1 Without without IN 10_1101-2021_01_02_425006 287 2 accurate accurate JJ 10_1101-2021_01_02_425006 287 3 reproduction reproduction NN 10_1101-2021_01_02_425006 287 4 , , , 10_1101-2021_01_02_425006 287 5 it -PRON- PRP 10_1101-2021_01_02_425006 287 6 is be VBZ 10_1101-2021_01_02_425006 287 7 difficult difficult JJ 10_1101-2021_01_02_425006 287 8 to to TO 10_1101-2021_01_02_425006 287 9 obtain obtain VB 10_1101-2021_01_02_425006 287 10 the the DT 10_1101-2021_01_02_425006 287 11 best good JJS 10_1101-2021_01_02_425006 287 12 434 434 CD 10_1101-2021_01_02_425006 287 13 expression expression NN 10_1101-2021_01_02_425006 287 14 combination combination NN 10_1101-2021_01_02_425006 287 15 of of IN 10_1101-2021_01_02_425006 287 16 ATUs atu NNS 10_1101-2021_01_02_425006 287 17 by by IN 10_1101-2021_01_02_425006 287 18 the the DT 10_1101-2021_01_02_425006 287 19 programming programming NN 10_1101-2021_01_02_425006 287 20 model model NN 10_1101-2021_01_02_425006 287 21 based base VBN 10_1101-2021_01_02_425006 287 22 on on IN 10_1101-2021_01_02_425006 287 23 the the DT 10_1101-2021_01_02_425006 287 24 expression expression NN 10_1101-2021_01_02_425006 287 25 value value NN 10_1101-2021_01_02_425006 287 26 of of IN 10_1101-2021_01_02_425006 287 27 genetic genetic JJ 10_1101-2021_01_02_425006 287 28 435 435 CD 10_1101-2021_01_02_425006 287 29 and and CC 10_1101-2021_01_02_425006 287 30 intergenic intergenic JJ 10_1101-2021_01_02_425006 287 31 regions region NNS 10_1101-2021_01_02_425006 287 32 . . . 10_1101-2021_01_02_425006 288 1 On on IN 10_1101-2021_01_02_425006 288 2 the the DT 10_1101-2021_01_02_425006 288 3 other other JJ 10_1101-2021_01_02_425006 288 4 hand hand NN 10_1101-2021_01_02_425006 288 5 , , , 10_1101-2021_01_02_425006 288 6 due due IN 10_1101-2021_01_02_425006 288 7 to to IN 10_1101-2021_01_02_425006 288 8 the the DT 10_1101-2021_01_02_425006 288 9 lack lack NN 10_1101-2021_01_02_425006 288 10 of of IN 10_1101-2021_01_02_425006 288 11 strand strand NN 10_1101-2021_01_02_425006 288 12 - - HYPH 10_1101-2021_01_02_425006 288 13 specific specific JJ 10_1101-2021_01_02_425006 288 14 RNA RNA NNP 10_1101-2021_01_02_425006 288 15 - - HYPH 10_1101-2021_01_02_425006 288 16 Seq seq NN 10_1101-2021_01_02_425006 288 17 data datum NNS 10_1101-2021_01_02_425006 288 18 , , , 10_1101-2021_01_02_425006 288 19 it -PRON- PRP 10_1101-2021_01_02_425006 288 20 is be VBZ 10_1101-2021_01_02_425006 288 21 difficult difficult JJ 10_1101-2021_01_02_425006 288 22 436 436 CD 10_1101-2021_01_02_425006 288 23 to to TO 10_1101-2021_01_02_425006 288 24 distinguish distinguish VB 10_1101-2021_01_02_425006 288 25 the the DT 10_1101-2021_01_02_425006 288 26 expression expression NN 10_1101-2021_01_02_425006 288 27 level level NN 10_1101-2021_01_02_425006 288 28 of of IN 10_1101-2021_01_02_425006 288 29 intergenic intergenic JJ 10_1101-2021_01_02_425006 288 30 regions region NNS 10_1101-2021_01_02_425006 288 31 between between IN 10_1101-2021_01_02_425006 288 32 two two CD 10_1101-2021_01_02_425006 288 33 consecutive consecutive JJ 10_1101-2021_01_02_425006 288 34 genes gene NNS 10_1101-2021_01_02_425006 288 35 on on IN 10_1101-2021_01_02_425006 288 36 the the DT 10_1101-2021_01_02_425006 288 37 same same JJ 10_1101-2021_01_02_425006 288 38 437 437 CD 10_1101-2021_01_02_425006 288 39 strand strand NN 10_1101-2021_01_02_425006 288 40 derived derive VBN 10_1101-2021_01_02_425006 288 41 from from IN 10_1101-2021_01_02_425006 288 42 ATUs atu NNS 10_1101-2021_01_02_425006 288 43 containing contain VBG 10_1101-2021_01_02_425006 288 44 these these DT 10_1101-2021_01_02_425006 288 45 two two CD 10_1101-2021_01_02_425006 288 46 genes gene NNS 10_1101-2021_01_02_425006 288 47 or or CC 10_1101-2021_01_02_425006 288 48 antisense antisense NN 10_1101-2021_01_02_425006 288 49 RNAs rna NNS 10_1101-2021_01_02_425006 288 50 ( ( -LRB- 10_1101-2021_01_02_425006 288 51 asRNAs asrnas NN 10_1101-2021_01_02_425006 288 52 ) ) -RRB- 10_1101-2021_01_02_425006 288 53 ( ( -LRB- 10_1101-2021_01_02_425006 288 54 6 6 CD 10_1101-2021_01_02_425006 288 55 , , , 10_1101-2021_01_02_425006 288 56 49 49 CD 10_1101-2021_01_02_425006 288 57 ) ) -RRB- 10_1101-2021_01_02_425006 288 58 . . . 10_1101-2021_01_02_425006 289 1 All all DT 10_1101-2021_01_02_425006 289 2 of of IN 10_1101-2021_01_02_425006 289 3 these these DT 10_1101-2021_01_02_425006 289 4 438 438 CD 10_1101-2021_01_02_425006 289 5 challenges challenge NNS 10_1101-2021_01_02_425006 289 6 and and CC 10_1101-2021_01_02_425006 289 7 the the DT 10_1101-2021_01_02_425006 289 8 great great JJ 10_1101-2021_01_02_425006 289 9 significance significance NN 10_1101-2021_01_02_425006 289 10 of of IN 10_1101-2021_01_02_425006 289 11 ATU ATU NNP 10_1101-2021_01_02_425006 289 12 prediction prediction NN 10_1101-2021_01_02_425006 289 13 inspire inspire VBP 10_1101-2021_01_02_425006 289 14 and and CC 10_1101-2021_01_02_425006 289 15 encourage encourage VBP 10_1101-2021_01_02_425006 289 16 us -PRON- PRP 10_1101-2021_01_02_425006 289 17 to to TO 10_1101-2021_01_02_425006 289 18 discover discover VB 10_1101-2021_01_02_425006 289 19 more more JJR 10_1101-2021_01_02_425006 289 20 439 439 CD 10_1101-2021_01_02_425006 289 21 information information NN 10_1101-2021_01_02_425006 289 22 to to TO 10_1101-2021_01_02_425006 289 23 determine determine VB 10_1101-2021_01_02_425006 289 24 the the DT 10_1101-2021_01_02_425006 289 25 ATU ATU NNP 10_1101-2021_01_02_425006 289 26 structure structure NN 10_1101-2021_01_02_425006 289 27 in in IN 10_1101-2021_01_02_425006 289 28 bacteria bacteria NNS 10_1101-2021_01_02_425006 289 29 . . . 10_1101-2021_01_02_425006 290 1 For for IN 10_1101-2021_01_02_425006 290 2 example example NN 10_1101-2021_01_02_425006 290 3 , , , 10_1101-2021_01_02_425006 290 4 we -PRON- PRP 10_1101-2021_01_02_425006 290 5 plan plan VBP 10_1101-2021_01_02_425006 290 6 to to TO 10_1101-2021_01_02_425006 290 7 add add VB 10_1101-2021_01_02_425006 290 8 high high JJ 10_1101-2021_01_02_425006 290 9 confidence confidence NN 10_1101-2021_01_02_425006 290 10 440 440 CD 10_1101-2021_01_02_425006 290 11 TSSs tss NNS 10_1101-2021_01_02_425006 290 12 and and CC 10_1101-2021_01_02_425006 290 13 TTSs tts NNS 10_1101-2021_01_02_425006 290 14 information information NN 10_1101-2021_01_02_425006 290 15 to to IN 10_1101-2021_01_02_425006 290 16 our -PRON- PRP$ 10_1101-2021_01_02_425006 290 17 programming programming NN 10_1101-2021_01_02_425006 290 18 model model NN 10_1101-2021_01_02_425006 290 19 in in IN 10_1101-2021_01_02_425006 290 20 the the DT 10_1101-2021_01_02_425006 290 21 future future NN 10_1101-2021_01_02_425006 290 22 . . . 10_1101-2021_01_02_425006 291 1 Additionally additionally RB 10_1101-2021_01_02_425006 291 2 , , , 10_1101-2021_01_02_425006 291 3 since since IN 10_1101-2021_01_02_425006 291 4 the the DT 10_1101-2021_01_02_425006 291 5 microbiome microbiome NN 10_1101-2021_01_02_425006 291 6 441 441 CD 10_1101-2021_01_02_425006 291 7 is be VBZ 10_1101-2021_01_02_425006 291 8 increasingly increasingly RB 10_1101-2021_01_02_425006 291 9 recognized recognize VBN 10_1101-2021_01_02_425006 291 10 as as IN 10_1101-2021_01_02_425006 291 11 a a DT 10_1101-2021_01_02_425006 291 12 critical critical JJ 10_1101-2021_01_02_425006 291 13 component component NN 10_1101-2021_01_02_425006 291 14 in in IN 10_1101-2021_01_02_425006 291 15 human human JJ 10_1101-2021_01_02_425006 291 16 diseases disease NNS 10_1101-2021_01_02_425006 291 17 , , , 10_1101-2021_01_02_425006 291 18 such such JJ 10_1101-2021_01_02_425006 291 19 as as IN 10_1101-2021_01_02_425006 291 20 inflammatory inflammatory JJ 10_1101-2021_01_02_425006 291 21 bowel bowel NN 10_1101-2021_01_02_425006 291 22 442 442 CD 10_1101-2021_01_02_425006 291 23 disease disease NN 10_1101-2021_01_02_425006 291 24 ( ( -LRB- 10_1101-2021_01_02_425006 291 25 50 50 CD 10_1101-2021_01_02_425006 291 26 ) ) -RRB- 10_1101-2021_01_02_425006 291 27 , , , 10_1101-2021_01_02_425006 291 28 antibiotic antibiotic NNP 10_1101-2021_01_02_425006 291 29 - - HYPH 10_1101-2021_01_02_425006 291 30 associated associate VBN 10_1101-2021_01_02_425006 291 31 diarrhoea diarrhoea NNP 10_1101-2021_01_02_425006 291 32 ( ( -LRB- 10_1101-2021_01_02_425006 291 33 51 51 CD 10_1101-2021_01_02_425006 291 34 ) ) -RRB- 10_1101-2021_01_02_425006 291 35 , , , 10_1101-2021_01_02_425006 291 36 neurological neurological JJ 10_1101-2021_01_02_425006 291 37 disorders disorder NNS 10_1101-2021_01_02_425006 291 38 ( ( -LRB- 10_1101-2021_01_02_425006 291 39 52 52 CD 10_1101-2021_01_02_425006 291 40 ) ) -RRB- 10_1101-2021_01_02_425006 291 41 , , , 10_1101-2021_01_02_425006 291 42 and and CC 10_1101-2021_01_02_425006 291 43 cancer cancer NN 10_1101-2021_01_02_425006 291 44 ( ( -LRB- 10_1101-2021_01_02_425006 291 45 53 53 CD 10_1101-2021_01_02_425006 291 46 ) ) -RRB- 10_1101-2021_01_02_425006 291 47 ( ( -LRB- 10_1101-2021_01_02_425006 291 48 54 54 CD 10_1101-2021_01_02_425006 291 49 ) ) -RRB- 10_1101-2021_01_02_425006 291 50 , , , 10_1101-2021_01_02_425006 291 51 443 443 CD 10_1101-2021_01_02_425006 291 52 predicting predict VBG 10_1101-2021_01_02_425006 291 53 new new JJ 10_1101-2021_01_02_425006 291 54 ATUs atu NNS 10_1101-2021_01_02_425006 291 55 of of IN 10_1101-2021_01_02_425006 291 56 uncultured uncultured JJ 10_1101-2021_01_02_425006 291 57 species specie NNS 10_1101-2021_01_02_425006 291 58 from from IN 10_1101-2021_01_02_425006 291 59 metagenomic metagenomic JJ 10_1101-2021_01_02_425006 291 60 and and CC 10_1101-2021_01_02_425006 291 61 metatranscriptomic metatranscriptomic JJ 10_1101-2021_01_02_425006 291 62 data datum NNS 10_1101-2021_01_02_425006 291 63 is be VBZ 10_1101-2021_01_02_425006 291 64 of of IN 10_1101-2021_01_02_425006 291 65 great great JJ 10_1101-2021_01_02_425006 291 66 444 444 CD 10_1101-2021_01_02_425006 291 67 significance significance NN 10_1101-2021_01_02_425006 291 68 in in IN 10_1101-2021_01_02_425006 291 69 uncovering uncover VBG 10_1101-2021_01_02_425006 291 70 new new JJ 10_1101-2021_01_02_425006 291 71 regulatory regulatory JJ 10_1101-2021_01_02_425006 291 72 pathway pathway NN 10_1101-2021_01_02_425006 291 73 and and CC 10_1101-2021_01_02_425006 291 74 metabolic metabolic JJ 10_1101-2021_01_02_425006 291 75 products product NNS 10_1101-2021_01_02_425006 291 76 during during IN 10_1101-2021_01_02_425006 291 77 the the DT 10_1101-2021_01_02_425006 291 78 development development NN 10_1101-2021_01_02_425006 291 79 of of IN 10_1101-2021_01_02_425006 291 80 445 445 CD 10_1101-2021_01_02_425006 291 81 .CC .CC : 10_1101-2021_01_02_425006 291 82 - - : 10_1101-2021_01_02_425006 291 83 BY by IN 10_1101-2021_01_02_425006 291 84 - - HYPH 10_1101-2021_01_02_425006 291 85 NC NC NNP 10_1101-2021_01_02_425006 291 86 - - HYPH 10_1101-2021_01_02_425006 291 87 ND ND NNP 10_1101-2021_01_02_425006 291 88 4.0 4.0 CD 10_1101-2021_01_02_425006 291 89 International International NNP 10_1101-2021_01_02_425006 291 90 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 291 91 under under IN 10_1101-2021_01_02_425006 291 92 a a DT 10_1101-2021_01_02_425006 291 93 ( ( -LRB- 10_1101-2021_01_02_425006 291 94 which which WDT 10_1101-2021_01_02_425006 291 95 was be VBD 10_1101-2021_01_02_425006 291 96 not not RB 10_1101-2021_01_02_425006 291 97 certified certify VBN 10_1101-2021_01_02_425006 291 98 by by IN 10_1101-2021_01_02_425006 291 99 peer peer NN 10_1101-2021_01_02_425006 291 100 review review NN 10_1101-2021_01_02_425006 291 101 ) ) -RRB- 10_1101-2021_01_02_425006 291 102 is be VBZ 10_1101-2021_01_02_425006 291 103 the the DT 10_1101-2021_01_02_425006 291 104 author author NN 10_1101-2021_01_02_425006 291 105 / / SYM 10_1101-2021_01_02_425006 291 106 funder funder NN 10_1101-2021_01_02_425006 291 107 , , , 10_1101-2021_01_02_425006 291 108 who who WP 10_1101-2021_01_02_425006 291 109 has have VBZ 10_1101-2021_01_02_425006 291 110 granted grant VBN 10_1101-2021_01_02_425006 291 111 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 291 112 a a DT 10_1101-2021_01_02_425006 291 113 license license NN 10_1101-2021_01_02_425006 291 114 to to TO 10_1101-2021_01_02_425006 291 115 display display VB 10_1101-2021_01_02_425006 291 116 the the DT 10_1101-2021_01_02_425006 291 117 preprint preprint NN 10_1101-2021_01_02_425006 291 118 in in IN 10_1101-2021_01_02_425006 291 119 perpetuity perpetuity NN 10_1101-2021_01_02_425006 291 120 . . . 10_1101-2021_01_02_425006 292 1 It -PRON- PRP 10_1101-2021_01_02_425006 292 2 is be VBZ 10_1101-2021_01_02_425006 292 3 made make VBN 10_1101-2021_01_02_425006 292 4 The the DT 10_1101-2021_01_02_425006 292 5 copyright copyright NN 10_1101-2021_01_02_425006 292 6 holder holder NN 10_1101-2021_01_02_425006 292 7 for for IN 10_1101-2021_01_02_425006 292 8 this this DT 10_1101-2021_01_02_425006 292 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 292 10 version version NN 10_1101-2021_01_02_425006 292 11 posted post VBD 10_1101-2021_01_02_425006 292 12 January January NNP 10_1101-2021_01_02_425006 292 13 6 6 CD 10_1101-2021_01_02_425006 292 14 , , , 10_1101-2021_01_02_425006 292 15 2021 2021 CD 10_1101-2021_01_02_425006 292 16 . . . 10_1101-2021_01_02_425006 292 17 ; ; : 10_1101-2021_01_02_425006 292 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 292 19 : : : 10_1101-2021_01_02_425006 292 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 292 21 preprint preprint NN 10_1101-2021_01_02_425006 292 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 292 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 292 24 26 26 CD 10_1101-2021_01_02_425006 292 25 diseases disease NNS 10_1101-2021_01_02_425006 292 26 ( ( -LRB- 10_1101-2021_01_02_425006 292 27 55 55 CD 10_1101-2021_01_02_425006 292 28 ) ) -RRB- 10_1101-2021_01_02_425006 292 29 . . . 10_1101-2021_01_02_425006 293 1 However however RB 10_1101-2021_01_02_425006 293 2 , , , 10_1101-2021_01_02_425006 293 3 due due IN 10_1101-2021_01_02_425006 293 4 to to IN 10_1101-2021_01_02_425006 293 5 a a DT 10_1101-2021_01_02_425006 293 6 majority majority NN 10_1101-2021_01_02_425006 293 7 of of IN 10_1101-2021_01_02_425006 293 8 species specie NNS 10_1101-2021_01_02_425006 293 9 with with IN 10_1101-2021_01_02_425006 293 10 unknown unknown JJ 10_1101-2021_01_02_425006 293 11 genomes genome NNS 10_1101-2021_01_02_425006 293 12 or or CC 10_1101-2021_01_02_425006 293 13 genome genome JJ 10_1101-2021_01_02_425006 293 14 annotations annotation NNS 10_1101-2021_01_02_425006 293 15 446 446 CD 10_1101-2021_01_02_425006 293 16 within within IN 10_1101-2021_01_02_425006 293 17 a a DT 10_1101-2021_01_02_425006 293 18 microbial microbial JJ 10_1101-2021_01_02_425006 293 19 community community NN 10_1101-2021_01_02_425006 293 20 , , , 10_1101-2021_01_02_425006 293 21 ATU ATU NNP 10_1101-2021_01_02_425006 293 22 prediction prediction NN 10_1101-2021_01_02_425006 293 23 on on IN 10_1101-2021_01_02_425006 293 24 metagenomics metagenomics NNP 10_1101-2021_01_02_425006 293 25 and and CC 10_1101-2021_01_02_425006 293 26 metatranscriptomics metatranscriptomics NNP 10_1101-2021_01_02_425006 293 27 is be VBZ 10_1101-2021_01_02_425006 293 28 still still RB 10_1101-2021_01_02_425006 293 29 a a DT 10_1101-2021_01_02_425006 293 30 447 447 CD 10_1101-2021_01_02_425006 293 31 challenging challenging NN 10_1101-2021_01_02_425006 293 32 task task NN 10_1101-2021_01_02_425006 293 33 , , , 10_1101-2021_01_02_425006 293 34 which which WDT 10_1101-2021_01_02_425006 293 35 encourage encourage VBP 10_1101-2021_01_02_425006 293 36 us -PRON- PRP 10_1101-2021_01_02_425006 293 37 to to TO 10_1101-2021_01_02_425006 293 38 pay pay VB 10_1101-2021_01_02_425006 293 39 more more JJR 10_1101-2021_01_02_425006 293 40 attention attention NN 10_1101-2021_01_02_425006 293 41 on on IN 10_1101-2021_01_02_425006 293 42 it -PRON- PRP 10_1101-2021_01_02_425006 293 43 . . . 10_1101-2021_01_02_425006 294 1 448 448 CD 10_1101-2021_01_02_425006 294 2 REFERENCES reference NNS 10_1101-2021_01_02_425006 294 3 449 449 CD 10_1101-2021_01_02_425006 294 4 1 1 CD 10_1101-2021_01_02_425006 294 5 . . . 10_1101-2021_01_02_425006 295 1 F. F. NNP 10_1101-2021_01_02_425006 295 2 Jacob Jacob NNP 10_1101-2021_01_02_425006 295 3 , , , 10_1101-2021_01_02_425006 295 4 D. D. NNP 10_1101-2021_01_02_425006 295 5 Perrin Perrin NNP 10_1101-2021_01_02_425006 295 6 , , , 10_1101-2021_01_02_425006 295 7 C. C. NNP 10_1101-2021_01_02_425006 295 8 Sanchez Sanchez NNP 10_1101-2021_01_02_425006 295 9 , , , 10_1101-2021_01_02_425006 295 10 J. J. NNP 10_1101-2021_01_02_425006 295 11 Monod Monod NNP 10_1101-2021_01_02_425006 295 12 , , , 10_1101-2021_01_02_425006 295 13 Operon Operon NNP 10_1101-2021_01_02_425006 295 14 : : : 10_1101-2021_01_02_425006 295 15 a a DT 10_1101-2021_01_02_425006 295 16 group group NN 10_1101-2021_01_02_425006 295 17 of of IN 10_1101-2021_01_02_425006 295 18 genes gene NNS 10_1101-2021_01_02_425006 295 19 with with IN 10_1101-2021_01_02_425006 295 20 the the DT 10_1101-2021_01_02_425006 295 21 expression expression NN 10_1101-2021_01_02_425006 295 22 450 450 CD 10_1101-2021_01_02_425006 295 23 coordinated coordinate VBN 10_1101-2021_01_02_425006 295 24 by by IN 10_1101-2021_01_02_425006 295 25 an an DT 10_1101-2021_01_02_425006 295 26 operator operator NN 10_1101-2021_01_02_425006 295 27 . . . 10_1101-2021_01_02_425006 296 1 C C NNP 10_1101-2021_01_02_425006 296 2 R R NNP 10_1101-2021_01_02_425006 296 3 Hebd Hebd NNP 10_1101-2021_01_02_425006 296 4 . . . 10_1101-2021_01_02_425006 297 1 Seances seance NNS 10_1101-2021_01_02_425006 297 2 . . . 10_1101-2021_01_02_425006 298 1 Acad Acad NNS 10_1101-2021_01_02_425006 298 2 . . . 10_1101-2021_01_02_425006 299 1 Sci Sci NNP 10_1101-2021_01_02_425006 299 2 250 250 CD 10_1101-2021_01_02_425006 299 3 , , , 10_1101-2021_01_02_425006 299 4 1727 1727 CD 10_1101-2021_01_02_425006 299 5 - - SYM 10_1101-2021_01_02_425006 299 6 1729 1729 CD 10_1101-2021_01_02_425006 299 7 ( ( -LRB- 10_1101-2021_01_02_425006 299 8 1960 1960 CD 10_1101-2021_01_02_425006 299 9 ) ) -RRB- 10_1101-2021_01_02_425006 299 10 . . . 10_1101-2021_01_02_425006 300 1 451 451 CD 10_1101-2021_01_02_425006 300 2 2 2 CD 10_1101-2021_01_02_425006 300 3 . . . 10_1101-2021_01_02_425006 301 1 F. F. NNP 10_1101-2021_01_02_425006 301 2 Jacob Jacob NNP 10_1101-2021_01_02_425006 301 3 , , , 10_1101-2021_01_02_425006 301 4 J. J. NNP 10_1101-2021_01_02_425006 301 5 Monod Monod NNP 10_1101-2021_01_02_425006 301 6 , , , 10_1101-2021_01_02_425006 301 7 Genetic genetic JJ 10_1101-2021_01_02_425006 301 8 regulatory regulatory JJ 10_1101-2021_01_02_425006 301 9 mechanisms mechanism NNS 10_1101-2021_01_02_425006 301 10 in in IN 10_1101-2021_01_02_425006 301 11 the the DT 10_1101-2021_01_02_425006 301 12 synthesis synthesis NN 10_1101-2021_01_02_425006 301 13 of of IN 10_1101-2021_01_02_425006 301 14 proteins protein NNS 10_1101-2021_01_02_425006 301 15 . . . 10_1101-2021_01_02_425006 302 1 J. J. NNP 10_1101-2021_01_02_425006 302 2 Mol Mol NNP 10_1101-2021_01_02_425006 302 3 . . . 10_1101-2021_01_02_425006 303 1 Biol Biol NNP 10_1101-2021_01_02_425006 303 2 . . . 10_1101-2021_01_02_425006 304 1 3 3 CD 10_1101-2021_01_02_425006 304 2 , , , 10_1101-2021_01_02_425006 304 3 452 452 CD 10_1101-2021_01_02_425006 304 4 318 318 CD 10_1101-2021_01_02_425006 304 5 - - SYM 10_1101-2021_01_02_425006 304 6 356 356 CD 10_1101-2021_01_02_425006 304 7 ( ( -LRB- 10_1101-2021_01_02_425006 304 8 1961 1961 CD 10_1101-2021_01_02_425006 304 9 ) ) -RRB- 10_1101-2021_01_02_425006 304 10 . . . 10_1101-2021_01_02_425006 305 1 453 453 CD 10_1101-2021_01_02_425006 305 2 3 3 CD 10_1101-2021_01_02_425006 305 3 . . . 10_1101-2021_01_02_425006 306 1 Z. Z. NNP 10_1101-2021_01_02_425006 306 2 Liu Liu NNP 10_1101-2021_01_02_425006 306 3 , , , 10_1101-2021_01_02_425006 306 4 J. J. NNP 10_1101-2021_01_02_425006 306 5 Feng Feng NNP 10_1101-2021_01_02_425006 306 6 , , , 10_1101-2021_01_02_425006 306 7 B. B. NNP 10_1101-2021_01_02_425006 306 8 Yu Yu NNP 10_1101-2021_01_02_425006 306 9 , , , 10_1101-2021_01_02_425006 306 10 Q. Q. NNP 10_1101-2021_01_02_425006 306 11 Ma Ma NNP 10_1101-2021_01_02_425006 306 12 , , , 10_1101-2021_01_02_425006 306 13 B. B. NNP 10_1101-2021_01_02_425006 306 14 Liu Liu NNP 10_1101-2021_01_02_425006 306 15 , , , 10_1101-2021_01_02_425006 306 16 The the DT 10_1101-2021_01_02_425006 306 17 functional functional JJ 10_1101-2021_01_02_425006 306 18 determinants determinant NNS 10_1101-2021_01_02_425006 306 19 in in IN 10_1101-2021_01_02_425006 306 20 the the DT 10_1101-2021_01_02_425006 306 21 organization organization NN 10_1101-2021_01_02_425006 306 22 of of IN 10_1101-2021_01_02_425006 306 23 bacterial bacterial JJ 10_1101-2021_01_02_425006 306 24 454 454 CD 10_1101-2021_01_02_425006 306 25 genomes genome NNS 10_1101-2021_01_02_425006 306 26 . . . 10_1101-2021_01_02_425006 307 1 Brief brief JJ 10_1101-2021_01_02_425006 307 2 . . . 10_1101-2021_01_02_425006 308 1 Bioinform Bioinform NNP 10_1101-2021_01_02_425006 308 2 . . NNP 10_1101-2021_01_02_425006 308 3 , , , 10_1101-2021_01_02_425006 308 4 doi.org/10.1093/bib/bbaa1172 doi.org/10.1093/bib/bbaa1172 NN 10_1101-2021_01_02_425006 308 5 ( ( -LRB- 10_1101-2021_01_02_425006 308 6 2020 2020 CD 10_1101-2021_01_02_425006 308 7 ) ) -RRB- 10_1101-2021_01_02_425006 308 8 . . . 10_1101-2021_01_02_425006 309 1 455 455 CD 10_1101-2021_01_02_425006 309 2 4 4 CD 10_1101-2021_01_02_425006 309 3 . . . 10_1101-2021_01_02_425006 310 1 W.-C. W.-C. NNP 10_1101-2021_01_02_425006 310 2 Chou Chou NNP 10_1101-2021_01_02_425006 310 3 , , , 10_1101-2021_01_02_425006 310 4 Q. Q. NNP 10_1101-2021_01_02_425006 310 5 Ma Ma NNP 10_1101-2021_01_02_425006 310 6 , , , 10_1101-2021_01_02_425006 310 7 S. S. NNP 10_1101-2021_01_02_425006 310 8 Yang Yang NNP 10_1101-2021_01_02_425006 310 9 , , , 10_1101-2021_01_02_425006 310 10 S. S. NNP 10_1101-2021_01_02_425006 310 11 Cao Cao NNP 10_1101-2021_01_02_425006 310 12 , , , 10_1101-2021_01_02_425006 310 13 D. D. NNP 10_1101-2021_01_02_425006 310 14 M. M. NNP 10_1101-2021_01_02_425006 310 15 Klingeman Klingeman NNP 10_1101-2021_01_02_425006 310 16 , , , 10_1101-2021_01_02_425006 310 17 S. S. NNP 10_1101-2021_01_02_425006 310 18 D. D. NNP 10_1101-2021_01_02_425006 310 19 Brown Brown NNP 10_1101-2021_01_02_425006 310 20 , , , 10_1101-2021_01_02_425006 310 21 Y. Y. NNP 10_1101-2021_01_02_425006 310 22 Xu Xu NNP 10_1101-2021_01_02_425006 310 23 , , , 10_1101-2021_01_02_425006 310 24 Analysis Analysis NNP 10_1101-2021_01_02_425006 310 25 of of IN 10_1101-2021_01_02_425006 310 26 strand-456 strand-456 NNP 10_1101-2021_01_02_425006 310 27 specific specific JJ 10_1101-2021_01_02_425006 310 28 RNA RNA NNP 10_1101-2021_01_02_425006 310 29 - - HYPH 10_1101-2021_01_02_425006 310 30 seq seq NN 10_1101-2021_01_02_425006 310 31 data datum NNS 10_1101-2021_01_02_425006 310 32 using use VBG 10_1101-2021_01_02_425006 310 33 machine machine NN 10_1101-2021_01_02_425006 310 34 learning learning NN 10_1101-2021_01_02_425006 310 35 reveals reveal VBZ 10_1101-2021_01_02_425006 310 36 the the DT 10_1101-2021_01_02_425006 310 37 structures structure NNS 10_1101-2021_01_02_425006 310 38 of of IN 10_1101-2021_01_02_425006 310 39 transcription transcription NN 10_1101-2021_01_02_425006 310 40 units unit NNS 10_1101-2021_01_02_425006 310 41 in in IN 10_1101-2021_01_02_425006 310 42 457 457 CD 10_1101-2021_01_02_425006 310 43 Clostridium Clostridium NNP 10_1101-2021_01_02_425006 310 44 thermocellum thermocellum NN 10_1101-2021_01_02_425006 310 45 . . . 10_1101-2021_01_02_425006 311 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 311 2 Acids Acids NNPS 10_1101-2021_01_02_425006 311 3 Res Res NNP 10_1101-2021_01_02_425006 311 4 . . . 10_1101-2021_01_02_425006 312 1 43 43 CD 10_1101-2021_01_02_425006 312 2 , , , 10_1101-2021_01_02_425006 312 3 e67-e67 e67-e67 NNP 10_1101-2021_01_02_425006 312 4 ( ( -LRB- 10_1101-2021_01_02_425006 312 5 2015 2015 CD 10_1101-2021_01_02_425006 312 6 ) ) -RRB- 10_1101-2021_01_02_425006 312 7 . . . 10_1101-2021_01_02_425006 313 1 458 458 CD 10_1101-2021_01_02_425006 313 2 5 5 CD 10_1101-2021_01_02_425006 313 3 . . . 10_1101-2021_01_02_425006 314 1 S.-Y. s.-y. XX 10_1101-2021_01_02_425006 315 1 Niu niu UH 10_1101-2021_01_02_425006 315 2 , , , 10_1101-2021_01_02_425006 315 3 B. B. NNP 10_1101-2021_01_02_425006 315 4 Liu Liu NNP 10_1101-2021_01_02_425006 315 5 , , , 10_1101-2021_01_02_425006 315 6 Q. Q. NNP 10_1101-2021_01_02_425006 315 7 Ma Ma NNP 10_1101-2021_01_02_425006 315 8 , , , 10_1101-2021_01_02_425006 315 9 W.-C. W.-C. NNP 10_1101-2021_01_02_425006 315 10 Chou Chou NNP 10_1101-2021_01_02_425006 315 11 , , , 10_1101-2021_01_02_425006 315 12 rSeqTU rseqtu CD 10_1101-2021_01_02_425006 315 13 — — : 10_1101-2021_01_02_425006 315 14 a a DT 10_1101-2021_01_02_425006 315 15 machine machine NN 10_1101-2021_01_02_425006 315 16 - - HYPH 10_1101-2021_01_02_425006 315 17 learning learn VBG 10_1101-2021_01_02_425006 315 18 based base VBN 10_1101-2021_01_02_425006 315 19 R r NN 10_1101-2021_01_02_425006 315 20 package package NN 10_1101-2021_01_02_425006 315 21 for for IN 10_1101-2021_01_02_425006 315 22 459 459 CD 10_1101-2021_01_02_425006 315 23 prediction prediction NN 10_1101-2021_01_02_425006 315 24 of of IN 10_1101-2021_01_02_425006 315 25 bacterial bacterial JJ 10_1101-2021_01_02_425006 315 26 transcription transcription NN 10_1101-2021_01_02_425006 315 27 units unit NNS 10_1101-2021_01_02_425006 315 28 . . . 10_1101-2021_01_02_425006 316 1 Frontiers frontier NNS 10_1101-2021_01_02_425006 316 2 in in IN 10_1101-2021_01_02_425006 316 3 genetics genetic NNS 10_1101-2021_01_02_425006 316 4 10 10 CD 10_1101-2021_01_02_425006 316 5 , , , 10_1101-2021_01_02_425006 316 6 374 374 CD 10_1101-2021_01_02_425006 316 7 ( ( -LRB- 10_1101-2021_01_02_425006 316 8 2019 2019 CD 10_1101-2021_01_02_425006 316 9 ) ) -RRB- 10_1101-2021_01_02_425006 316 10 . . . 10_1101-2021_01_02_425006 317 1 460 460 CD 10_1101-2021_01_02_425006 317 2 6 6 CD 10_1101-2021_01_02_425006 317 3 . . . 10_1101-2021_01_02_425006 318 1 B. B. NNP 10_1101-2021_01_02_425006 318 2 Yan Yan NNP 10_1101-2021_01_02_425006 318 3 , , , 10_1101-2021_01_02_425006 318 4 M. M. NNP 10_1101-2021_01_02_425006 318 5 Boitano Boitano NNP 10_1101-2021_01_02_425006 318 6 , , , 10_1101-2021_01_02_425006 318 7 T. T. NNP 10_1101-2021_01_02_425006 318 8 A. a. NN 10_1101-2021_01_02_425006 318 9 Clark Clark NNP 10_1101-2021_01_02_425006 318 10 , , , 10_1101-2021_01_02_425006 318 11 L. L. NNP 10_1101-2021_01_02_425006 318 12 Ettwiller Ettwiller NNP 10_1101-2021_01_02_425006 318 13 , , , 10_1101-2021_01_02_425006 318 14 SMRT SMRT NNP 10_1101-2021_01_02_425006 318 15 - - HYPH 10_1101-2021_01_02_425006 318 16 Cappable cappable JJ 10_1101-2021_01_02_425006 318 17 - - HYPH 10_1101-2021_01_02_425006 318 18 seq seq NN 10_1101-2021_01_02_425006 318 19 reveals reveal VBZ 10_1101-2021_01_02_425006 318 20 complex complex JJ 10_1101-2021_01_02_425006 318 21 operon operon NNP 10_1101-2021_01_02_425006 318 22 461 461 CD 10_1101-2021_01_02_425006 318 23 variants variant NNS 10_1101-2021_01_02_425006 318 24 in in IN 10_1101-2021_01_02_425006 318 25 bacteria bacteria NNS 10_1101-2021_01_02_425006 318 26 . . . 10_1101-2021_01_02_425006 319 1 Nat Nat NNP 10_1101-2021_01_02_425006 319 2 . . . 10_1101-2021_01_02_425006 320 1 Commun Commun VBN 10_1101-2021_01_02_425006 320 2 . . . 10_1101-2021_01_02_425006 321 1 9 9 CD 10_1101-2021_01_02_425006 321 2 , , , 10_1101-2021_01_02_425006 321 3 3676 3676 CD 10_1101-2021_01_02_425006 321 4 ( ( -LRB- 10_1101-2021_01_02_425006 321 5 2018 2018 CD 10_1101-2021_01_02_425006 321 6 ) ) -RRB- 10_1101-2021_01_02_425006 321 7 . . . 10_1101-2021_01_02_425006 322 1 462 462 CD 10_1101-2021_01_02_425006 322 2 .CC .CC : 10_1101-2021_01_02_425006 322 3 - - HYPH 10_1101-2021_01_02_425006 322 4 BY by IN 10_1101-2021_01_02_425006 322 5 - - HYPH 10_1101-2021_01_02_425006 322 6 NC NC NNP 10_1101-2021_01_02_425006 322 7 - - HYPH 10_1101-2021_01_02_425006 322 8 ND ND NNP 10_1101-2021_01_02_425006 322 9 4.0 4.0 CD 10_1101-2021_01_02_425006 322 10 International International NNP 10_1101-2021_01_02_425006 322 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 322 12 under under IN 10_1101-2021_01_02_425006 322 13 a a DT 10_1101-2021_01_02_425006 322 14 ( ( -LRB- 10_1101-2021_01_02_425006 322 15 which which WDT 10_1101-2021_01_02_425006 322 16 was be VBD 10_1101-2021_01_02_425006 322 17 not not RB 10_1101-2021_01_02_425006 322 18 certified certify VBN 10_1101-2021_01_02_425006 322 19 by by IN 10_1101-2021_01_02_425006 322 20 peer peer NN 10_1101-2021_01_02_425006 322 21 review review NN 10_1101-2021_01_02_425006 322 22 ) ) -RRB- 10_1101-2021_01_02_425006 322 23 is be VBZ 10_1101-2021_01_02_425006 322 24 the the DT 10_1101-2021_01_02_425006 322 25 author author NN 10_1101-2021_01_02_425006 322 26 / / SYM 10_1101-2021_01_02_425006 322 27 funder funder NN 10_1101-2021_01_02_425006 322 28 , , , 10_1101-2021_01_02_425006 322 29 who who WP 10_1101-2021_01_02_425006 322 30 has have VBZ 10_1101-2021_01_02_425006 322 31 granted grant VBN 10_1101-2021_01_02_425006 322 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 322 33 a a DT 10_1101-2021_01_02_425006 322 34 license license NN 10_1101-2021_01_02_425006 322 35 to to TO 10_1101-2021_01_02_425006 322 36 display display VB 10_1101-2021_01_02_425006 322 37 the the DT 10_1101-2021_01_02_425006 322 38 preprint preprint NN 10_1101-2021_01_02_425006 322 39 in in IN 10_1101-2021_01_02_425006 322 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 322 41 . . . 10_1101-2021_01_02_425006 323 1 It -PRON- PRP 10_1101-2021_01_02_425006 323 2 is be VBZ 10_1101-2021_01_02_425006 323 3 made make VBN 10_1101-2021_01_02_425006 323 4 The the DT 10_1101-2021_01_02_425006 323 5 copyright copyright NN 10_1101-2021_01_02_425006 323 6 holder holder NN 10_1101-2021_01_02_425006 323 7 for for IN 10_1101-2021_01_02_425006 323 8 this this DT 10_1101-2021_01_02_425006 323 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 323 10 version version NN 10_1101-2021_01_02_425006 323 11 posted post VBD 10_1101-2021_01_02_425006 323 12 January January NNP 10_1101-2021_01_02_425006 323 13 6 6 CD 10_1101-2021_01_02_425006 323 14 , , , 10_1101-2021_01_02_425006 323 15 2021 2021 CD 10_1101-2021_01_02_425006 323 16 . . . 10_1101-2021_01_02_425006 323 17 ; ; : 10_1101-2021_01_02_425006 323 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 323 19 : : : 10_1101-2021_01_02_425006 323 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 323 21 preprint preprint NN 10_1101-2021_01_02_425006 323 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 323 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 323 24 27 27 CD 10_1101-2021_01_02_425006 323 25 7 7 CD 10_1101-2021_01_02_425006 323 26 . . . 10_1101-2021_01_02_425006 324 1 X. X. NNP 10_1101-2021_01_02_425006 324 2 Ju Ju NNP 10_1101-2021_01_02_425006 324 3 , , , 10_1101-2021_01_02_425006 324 4 D. D. NNP 10_1101-2021_01_02_425006 324 5 Li Li NNP 10_1101-2021_01_02_425006 324 6 , , , 10_1101-2021_01_02_425006 324 7 S. S. NNP 10_1101-2021_01_02_425006 324 8 Liu Liu NNP 10_1101-2021_01_02_425006 324 9 , , , 10_1101-2021_01_02_425006 324 10 Full Full NNP 10_1101-2021_01_02_425006 324 11 - - HYPH 10_1101-2021_01_02_425006 324 12 length length NN 10_1101-2021_01_02_425006 324 13 RNA RNA NNP 10_1101-2021_01_02_425006 324 14 profiling profiling NN 10_1101-2021_01_02_425006 324 15 reveals reveal VBZ 10_1101-2021_01_02_425006 324 16 pervasive pervasive JJ 10_1101-2021_01_02_425006 324 17 bidirectional bidirectional JJ 10_1101-2021_01_02_425006 324 18 transcription transcription NN 10_1101-2021_01_02_425006 324 19 463 463 CD 10_1101-2021_01_02_425006 324 20 terminators terminator NNS 10_1101-2021_01_02_425006 324 21 in in IN 10_1101-2021_01_02_425006 324 22 bacteria bacteria NNS 10_1101-2021_01_02_425006 324 23 . . . 10_1101-2021_01_02_425006 325 1 Nature nature NN 10_1101-2021_01_02_425006 325 2 microbiology microbiology NN 10_1101-2021_01_02_425006 325 3 4 4 CD 10_1101-2021_01_02_425006 325 4 , , , 10_1101-2021_01_02_425006 325 5 1907 1907 CD 10_1101-2021_01_02_425006 325 6 - - SYM 10_1101-2021_01_02_425006 325 7 1918 1918 CD 10_1101-2021_01_02_425006 325 8 ( ( -LRB- 10_1101-2021_01_02_425006 325 9 2019 2019 CD 10_1101-2021_01_02_425006 325 10 ) ) -RRB- 10_1101-2021_01_02_425006 325 11 . . . 10_1101-2021_01_02_425006 326 1 464 464 CD 10_1101-2021_01_02_425006 326 2 8 8 CD 10_1101-2021_01_02_425006 326 3 . . . 10_1101-2021_01_02_425006 327 1 K. K. NNP 10_1101-2021_01_02_425006 327 2 Totsuka Totsuka NNP 10_1101-2021_01_02_425006 327 3 , , , 10_1101-2021_01_02_425006 327 4 K. K. NNP 10_1101-2021_01_02_425006 327 5 Totsuka Totsuka NNP 10_1101-2021_01_02_425006 327 6 , , , 10_1101-2021_01_02_425006 327 7 The the DT 10_1101-2021_01_02_425006 327 8 Transcription Transcription NNP 10_1101-2021_01_02_425006 327 9 Unit Unit NNP 10_1101-2021_01_02_425006 327 10 Architecture Architecture NNP 10_1101-2021_01_02_425006 327 11 of of IN 10_1101-2021_01_02_425006 327 12 the the DT 10_1101-2021_01_02_425006 327 13 Escherichia Escherichia NNP 10_1101-2021_01_02_425006 327 14 Coli Coli NNPS 10_1101-2021_01_02_425006 327 15 Genome Genome NNP 10_1101-2021_01_02_425006 327 16 . . . 10_1101-2021_01_02_425006 328 1 Nat Nat NNP 10_1101-2021_01_02_425006 328 2 . . . 10_1101-2021_01_02_425006 329 1 465 465 CD 10_1101-2021_01_02_425006 329 2 Biotechnol Biotechnol NNPS 10_1101-2021_01_02_425006 329 3 . . . 10_1101-2021_01_02_425006 330 1 27 27 CD 10_1101-2021_01_02_425006 330 2 , , , 10_1101-2021_01_02_425006 330 3 1043 1043 CD 10_1101-2021_01_02_425006 330 4 - - SYM 10_1101-2021_01_02_425006 330 5 1049 1049 CD 10_1101-2021_01_02_425006 330 6 ( ( -LRB- 10_1101-2021_01_02_425006 330 7 2009 2009 CD 10_1101-2021_01_02_425006 330 8 ) ) -RRB- 10_1101-2021_01_02_425006 330 9 . . . 10_1101-2021_01_02_425006 331 1 466 466 CD 10_1101-2021_01_02_425006 331 2 9 9 CD 10_1101-2021_01_02_425006 331 3 . . . 10_1101-2021_01_02_425006 332 1 A. A. NNP 10_1101-2021_01_02_425006 332 2 H. H. NNP 10_1101-2021_01_02_425006 332 3 Bhat Bhat NNP 10_1101-2021_01_02_425006 332 4 , , , 10_1101-2021_01_02_425006 332 5 D. D. NNP 10_1101-2021_01_02_425006 332 6 Pathak Pathak NNP 10_1101-2021_01_02_425006 332 7 , , , 10_1101-2021_01_02_425006 332 8 A. a. NN 10_1101-2021_01_02_425006 332 9 Rao Rao NNP 10_1101-2021_01_02_425006 332 10 , , , 10_1101-2021_01_02_425006 332 11 The the DT 10_1101-2021_01_02_425006 332 12 alr alr NN 10_1101-2021_01_02_425006 332 13 - - HYPH 10_1101-2021_01_02_425006 332 14 groEL1 groel1 NN 10_1101-2021_01_02_425006 332 15 operon operon NN 10_1101-2021_01_02_425006 332 16 in in IN 10_1101-2021_01_02_425006 332 17 Mycobacterium Mycobacterium NNP 10_1101-2021_01_02_425006 332 18 tuberculosis tuberculosis NN 10_1101-2021_01_02_425006 332 19 : : : 10_1101-2021_01_02_425006 332 20 an an DT 10_1101-2021_01_02_425006 332 21 interplay interplay NN 10_1101-2021_01_02_425006 332 22 467 467 CD 10_1101-2021_01_02_425006 332 23 of of IN 10_1101-2021_01_02_425006 332 24 multiple multiple JJ 10_1101-2021_01_02_425006 332 25 regulatory regulatory JJ 10_1101-2021_01_02_425006 332 26 elements element NNS 10_1101-2021_01_02_425006 332 27 . . . 10_1101-2021_01_02_425006 333 1 Scientific Scientific NNP 10_1101-2021_01_02_425006 333 2 Reports report NNS 10_1101-2021_01_02_425006 333 3 7 7 CD 10_1101-2021_01_02_425006 333 4 , , , 10_1101-2021_01_02_425006 333 5 43772 43772 CD 10_1101-2021_01_02_425006 333 6 ( ( -LRB- 10_1101-2021_01_02_425006 333 7 2017 2017 CD 10_1101-2021_01_02_425006 333 8 ) ) -RRB- 10_1101-2021_01_02_425006 333 9 . . . 10_1101-2021_01_02_425006 334 1 468 468 CD 10_1101-2021_01_02_425006 334 2 10 10 CD 10_1101-2021_01_02_425006 334 3 . . . 10_1101-2021_01_02_425006 335 1 C. C. NNP 10_1101-2021_01_02_425006 335 2 M. M. NNP 10_1101-2021_01_02_425006 335 3 Sharma Sharma NNP 10_1101-2021_01_02_425006 335 4 , , , 10_1101-2021_01_02_425006 335 5 S. S. NNP 10_1101-2021_01_02_425006 335 6 Hoffmann Hoffmann NNP 10_1101-2021_01_02_425006 335 7 , , , 10_1101-2021_01_02_425006 335 8 F. F. NNP 10_1101-2021_01_02_425006 335 9 Darfeuille Darfeuille NNP 10_1101-2021_01_02_425006 335 10 , , , 10_1101-2021_01_02_425006 335 11 J. J. NNP 10_1101-2021_01_02_425006 335 12 Reignier Reignier NNP 10_1101-2021_01_02_425006 335 13 , , , 10_1101-2021_01_02_425006 335 14 S. S. NNP 10_1101-2021_01_02_425006 335 15 Findeiß Findeiß NNP 10_1101-2021_01_02_425006 335 16 , , , 10_1101-2021_01_02_425006 335 17 A. A. NNP 10_1101-2021_01_02_425006 335 18 Sittka Sittka NNP 10_1101-2021_01_02_425006 335 19 , , , 10_1101-2021_01_02_425006 335 20 S. S. NNP 10_1101-2021_01_02_425006 335 21 Chabas Chabas NNP 10_1101-2021_01_02_425006 335 22 , , , 10_1101-2021_01_02_425006 335 23 K. K. NNP 10_1101-2021_01_02_425006 335 24 Reiche Reiche NNP 10_1101-2021_01_02_425006 335 25 , , , 10_1101-2021_01_02_425006 335 26 469 469 CD 10_1101-2021_01_02_425006 335 27 J. J. NNP 10_1101-2021_01_02_425006 335 28 Hackermüller Hackermüller NNP 10_1101-2021_01_02_425006 335 29 , , , 10_1101-2021_01_02_425006 335 30 R. R. NNP 10_1101-2021_01_02_425006 335 31 Reinhardt Reinhardt NNP 10_1101-2021_01_02_425006 335 32 , , , 10_1101-2021_01_02_425006 335 33 The the DT 10_1101-2021_01_02_425006 335 34 primary primary JJ 10_1101-2021_01_02_425006 335 35 transcriptome transcriptome NN 10_1101-2021_01_02_425006 335 36 of of IN 10_1101-2021_01_02_425006 335 37 the the DT 10_1101-2021_01_02_425006 335 38 major major JJ 10_1101-2021_01_02_425006 335 39 human human JJ 10_1101-2021_01_02_425006 335 40 pathogen pathogen NN 10_1101-2021_01_02_425006 335 41 470 470 CD 10_1101-2021_01_02_425006 335 42 Helicobacter Helicobacter NNP 10_1101-2021_01_02_425006 335 43 pylori pylori NN 10_1101-2021_01_02_425006 335 44 . . . 10_1101-2021_01_02_425006 336 1 Nature nature NN 10_1101-2021_01_02_425006 336 2 464 464 CD 10_1101-2021_01_02_425006 336 3 , , , 10_1101-2021_01_02_425006 336 4 250 250 CD 10_1101-2021_01_02_425006 336 5 - - HYPH 10_1101-2021_01_02_425006 336 6 255 255 CD 10_1101-2021_01_02_425006 336 7 ( ( -LRB- 10_1101-2021_01_02_425006 336 8 2010 2010 CD 10_1101-2021_01_02_425006 336 9 ) ) -RRB- 10_1101-2021_01_02_425006 336 10 . . . 10_1101-2021_01_02_425006 337 1 471 471 CD 10_1101-2021_01_02_425006 337 2 11 11 CD 10_1101-2021_01_02_425006 337 3 . . . 10_1101-2021_01_02_425006 338 1 J. J. NNP 10_1101-2021_01_02_425006 338 2 M. M. NNP 10_1101-2021_01_02_425006 338 3 Durand Durand NNP 10_1101-2021_01_02_425006 338 4 , , , 10_1101-2021_01_02_425006 338 5 G. G. NNP 10_1101-2021_01_02_425006 338 6 R. R. NNP 10_1101-2021_01_02_425006 338 7 Bjork Bjork NNP 10_1101-2021_01_02_425006 338 8 , , , 10_1101-2021_01_02_425006 338 9 Putrescine Putrescine NNP 10_1101-2021_01_02_425006 338 10 or or CC 10_1101-2021_01_02_425006 338 11 a a DT 10_1101-2021_01_02_425006 338 12 combination combination NN 10_1101-2021_01_02_425006 338 13 of of IN 10_1101-2021_01_02_425006 338 14 methionine methionine NN 10_1101-2021_01_02_425006 338 15 and and CC 10_1101-2021_01_02_425006 338 16 arginine arginine NNP 10_1101-2021_01_02_425006 338 17 restores restore VBZ 10_1101-2021_01_02_425006 338 18 472 472 CD 10_1101-2021_01_02_425006 338 19 virulence virulence NN 10_1101-2021_01_02_425006 338 20 gene gene NN 10_1101-2021_01_02_425006 338 21 expression expression NN 10_1101-2021_01_02_425006 338 22 in in IN 10_1101-2021_01_02_425006 338 23 a a DT 10_1101-2021_01_02_425006 338 24 tRNA trna NN 10_1101-2021_01_02_425006 338 25 modification modification NN 10_1101-2021_01_02_425006 338 26 - - HYPH 10_1101-2021_01_02_425006 338 27 deficient deficient JJ 10_1101-2021_01_02_425006 338 28 mutant mutant NN 10_1101-2021_01_02_425006 338 29 of of IN 10_1101-2021_01_02_425006 338 30 Shigella Shigella NNP 10_1101-2021_01_02_425006 338 31 flexneri flexneri NN 10_1101-2021_01_02_425006 338 32 : : : 10_1101-2021_01_02_425006 338 33 a a DT 10_1101-2021_01_02_425006 338 34 possible possible JJ 10_1101-2021_01_02_425006 338 35 473 473 CD 10_1101-2021_01_02_425006 338 36 role role NN 10_1101-2021_01_02_425006 338 37 in in IN 10_1101-2021_01_02_425006 338 38 adaptation adaptation NN 10_1101-2021_01_02_425006 338 39 of of IN 10_1101-2021_01_02_425006 338 40 virulence virulence NN 10_1101-2021_01_02_425006 338 41 . . . 10_1101-2021_01_02_425006 339 1 Mol Mol NNP 10_1101-2021_01_02_425006 339 2 . . . 10_1101-2021_01_02_425006 340 1 Microbiol Microbiol NNP 10_1101-2021_01_02_425006 340 2 . . . 10_1101-2021_01_02_425006 341 1 47 47 CD 10_1101-2021_01_02_425006 341 2 , , , 10_1101-2021_01_02_425006 341 3 519 519 CD 10_1101-2021_01_02_425006 341 4 - - SYM 10_1101-2021_01_02_425006 341 5 527 527 CD 10_1101-2021_01_02_425006 341 6 ( ( -LRB- 10_1101-2021_01_02_425006 341 7 2010 2010 CD 10_1101-2021_01_02_425006 341 8 ) ) -RRB- 10_1101-2021_01_02_425006 341 9 . . . 10_1101-2021_01_02_425006 342 1 474 474 CD 10_1101-2021_01_02_425006 342 2 12 12 CD 10_1101-2021_01_02_425006 342 3 . . . 10_1101-2021_01_02_425006 343 1 L. L. NNP 10_1101-2021_01_02_425006 343 2 E. E. NNP 10_1101-2021_01_02_425006 343 3 Wroblewski Wroblewski NNP 10_1101-2021_01_02_425006 343 4 , , , 10_1101-2021_01_02_425006 343 5 R. R. NNP 10_1101-2021_01_02_425006 343 6 M. M. NNP 10_1101-2021_01_02_425006 343 7 Peek Peek NNP 10_1101-2021_01_02_425006 343 8 , , , 10_1101-2021_01_02_425006 343 9 K. K. NNP 10_1101-2021_01_02_425006 343 10 T. T. NNP 10_1101-2021_01_02_425006 343 11 Wilson Wilson NNP 10_1101-2021_01_02_425006 343 12 , , , 10_1101-2021_01_02_425006 343 13 Helicobacter Helicobacter NNP 10_1101-2021_01_02_425006 343 14 pylori pylori NN 10_1101-2021_01_02_425006 343 15 and and CC 10_1101-2021_01_02_425006 343 16 gastric gastric JJ 10_1101-2021_01_02_425006 343 17 cancer cancer NN 10_1101-2021_01_02_425006 343 18 : : : 10_1101-2021_01_02_425006 343 19 factors factor NNS 10_1101-2021_01_02_425006 343 20 that that IN 10_1101-2021_01_02_425006 343 21 475 475 CD 10_1101-2021_01_02_425006 343 22 modulate modulate NN 10_1101-2021_01_02_425006 343 23 disease disease NN 10_1101-2021_01_02_425006 343 24 risk risk NN 10_1101-2021_01_02_425006 343 25 . . . 10_1101-2021_01_02_425006 344 1 Clin Clin NNP 10_1101-2021_01_02_425006 344 2 . . . 10_1101-2021_01_02_425006 345 1 Microbiol Microbiol NNP 10_1101-2021_01_02_425006 345 2 . . . 10_1101-2021_01_02_425006 346 1 Rev. Rev. NNP 10_1101-2021_01_02_425006 347 1 23 23 CD 10_1101-2021_01_02_425006 347 2 , , , 10_1101-2021_01_02_425006 347 3 713 713 CD 10_1101-2021_01_02_425006 347 4 - - SYM 10_1101-2021_01_02_425006 347 5 739 739 CD 10_1101-2021_01_02_425006 347 6 ( ( -LRB- 10_1101-2021_01_02_425006 347 7 2010 2010 CD 10_1101-2021_01_02_425006 347 8 ) ) -RRB- 10_1101-2021_01_02_425006 347 9 . . . 10_1101-2021_01_02_425006 348 1 476 476 CD 10_1101-2021_01_02_425006 348 2 13 13 CD 10_1101-2021_01_02_425006 348 3 . . . 10_1101-2021_01_02_425006 349 1 L. L. NNP 10_1101-2021_01_02_425006 349 2 Ettwiller Ettwiller NNP 10_1101-2021_01_02_425006 349 3 , , , 10_1101-2021_01_02_425006 349 4 J. J. NNP 10_1101-2021_01_02_425006 349 5 Buswell Buswell NNP 10_1101-2021_01_02_425006 349 6 , , , 10_1101-2021_01_02_425006 349 7 E. E. NNP 10_1101-2021_01_02_425006 349 8 Yigit Yigit NNP 10_1101-2021_01_02_425006 349 9 , , , 10_1101-2021_01_02_425006 349 10 I. I. NNP 10_1101-2021_01_02_425006 349 11 Schildkraut Schildkraut NNP 10_1101-2021_01_02_425006 349 12 , , , 10_1101-2021_01_02_425006 349 13 A a DT 10_1101-2021_01_02_425006 349 14 novel novel JJ 10_1101-2021_01_02_425006 349 15 enrichment enrichment NN 10_1101-2021_01_02_425006 349 16 strategy strategy NN 10_1101-2021_01_02_425006 349 17 reveals reveal VBZ 10_1101-2021_01_02_425006 349 18 unprecedented unprecedented JJ 10_1101-2021_01_02_425006 349 19 477 477 CD 10_1101-2021_01_02_425006 349 20 number number NN 10_1101-2021_01_02_425006 349 21 of of IN 10_1101-2021_01_02_425006 349 22 novel novel JJ 10_1101-2021_01_02_425006 349 23 transcription transcription NN 10_1101-2021_01_02_425006 349 24 start start NN 10_1101-2021_01_02_425006 349 25 sites site NNS 10_1101-2021_01_02_425006 349 26 at at IN 10_1101-2021_01_02_425006 349 27 single single JJ 10_1101-2021_01_02_425006 349 28 base base NN 10_1101-2021_01_02_425006 349 29 resolution resolution NN 10_1101-2021_01_02_425006 349 30 in in IN 10_1101-2021_01_02_425006 349 31 a a DT 10_1101-2021_01_02_425006 349 32 model model NN 10_1101-2021_01_02_425006 349 33 prokaryote prokaryote NN 10_1101-2021_01_02_425006 349 34 and and CC 10_1101-2021_01_02_425006 349 35 the the DT 10_1101-2021_01_02_425006 349 36 478 478 CD 10_1101-2021_01_02_425006 349 37 gut gut NN 10_1101-2021_01_02_425006 349 38 microbiome microbiome NN 10_1101-2021_01_02_425006 349 39 . . . 10_1101-2021_01_02_425006 350 1 BMC BMC NNP 10_1101-2021_01_02_425006 350 2 Genomics Genomics NNP 10_1101-2021_01_02_425006 350 3 17 17 CD 10_1101-2021_01_02_425006 350 4 , , , 10_1101-2021_01_02_425006 350 5 199 199 CD 10_1101-2021_01_02_425006 350 6 - - SYM 10_1101-2021_01_02_425006 350 7 199 199 CD 10_1101-2021_01_02_425006 350 8 ( ( -LRB- 10_1101-2021_01_02_425006 350 9 2016 2016 CD 10_1101-2021_01_02_425006 350 10 ) ) -RRB- 10_1101-2021_01_02_425006 350 11 . . . 10_1101-2021_01_02_425006 351 1 479 479 CD 10_1101-2021_01_02_425006 351 2 .CC .CC : 10_1101-2021_01_02_425006 351 3 - - HYPH 10_1101-2021_01_02_425006 351 4 BY by IN 10_1101-2021_01_02_425006 351 5 - - HYPH 10_1101-2021_01_02_425006 351 6 NC NC NNP 10_1101-2021_01_02_425006 351 7 - - HYPH 10_1101-2021_01_02_425006 351 8 ND ND NNP 10_1101-2021_01_02_425006 351 9 4.0 4.0 CD 10_1101-2021_01_02_425006 351 10 International International NNP 10_1101-2021_01_02_425006 351 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 351 12 under under IN 10_1101-2021_01_02_425006 351 13 a a DT 10_1101-2021_01_02_425006 351 14 ( ( -LRB- 10_1101-2021_01_02_425006 351 15 which which WDT 10_1101-2021_01_02_425006 351 16 was be VBD 10_1101-2021_01_02_425006 351 17 not not RB 10_1101-2021_01_02_425006 351 18 certified certify VBN 10_1101-2021_01_02_425006 351 19 by by IN 10_1101-2021_01_02_425006 351 20 peer peer NN 10_1101-2021_01_02_425006 351 21 review review NN 10_1101-2021_01_02_425006 351 22 ) ) -RRB- 10_1101-2021_01_02_425006 351 23 is be VBZ 10_1101-2021_01_02_425006 351 24 the the DT 10_1101-2021_01_02_425006 351 25 author author NN 10_1101-2021_01_02_425006 351 26 / / SYM 10_1101-2021_01_02_425006 351 27 funder funder NN 10_1101-2021_01_02_425006 351 28 , , , 10_1101-2021_01_02_425006 351 29 who who WP 10_1101-2021_01_02_425006 351 30 has have VBZ 10_1101-2021_01_02_425006 351 31 granted grant VBN 10_1101-2021_01_02_425006 351 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 351 33 a a DT 10_1101-2021_01_02_425006 351 34 license license NN 10_1101-2021_01_02_425006 351 35 to to TO 10_1101-2021_01_02_425006 351 36 display display VB 10_1101-2021_01_02_425006 351 37 the the DT 10_1101-2021_01_02_425006 351 38 preprint preprint NN 10_1101-2021_01_02_425006 351 39 in in IN 10_1101-2021_01_02_425006 351 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 351 41 . . . 10_1101-2021_01_02_425006 352 1 It -PRON- PRP 10_1101-2021_01_02_425006 352 2 is be VBZ 10_1101-2021_01_02_425006 352 3 made make VBN 10_1101-2021_01_02_425006 352 4 The the DT 10_1101-2021_01_02_425006 352 5 copyright copyright NN 10_1101-2021_01_02_425006 352 6 holder holder NN 10_1101-2021_01_02_425006 352 7 for for IN 10_1101-2021_01_02_425006 352 8 this this DT 10_1101-2021_01_02_425006 352 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 352 10 version version NN 10_1101-2021_01_02_425006 352 11 posted post VBD 10_1101-2021_01_02_425006 352 12 January January NNP 10_1101-2021_01_02_425006 352 13 6 6 CD 10_1101-2021_01_02_425006 352 14 , , , 10_1101-2021_01_02_425006 352 15 2021 2021 CD 10_1101-2021_01_02_425006 352 16 . . . 10_1101-2021_01_02_425006 352 17 ; ; : 10_1101-2021_01_02_425006 352 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 352 19 : : : 10_1101-2021_01_02_425006 352 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 352 21 preprint preprint NN 10_1101-2021_01_02_425006 352 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 352 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 352 24 28 28 CD 10_1101-2021_01_02_425006 352 25 14 14 CD 10_1101-2021_01_02_425006 352 26 . . . 10_1101-2021_01_02_425006 353 1 M. M. NNP 10_1101-2021_01_02_425006 353 2 K. K. NNP 10_1101-2021_01_02_425006 353 3 Thomason Thomason NNP 10_1101-2021_01_02_425006 353 4 , , , 10_1101-2021_01_02_425006 353 5 T. T. NNP 10_1101-2021_01_02_425006 353 6 Bischler Bischler NNP 10_1101-2021_01_02_425006 353 7 , , , 10_1101-2021_01_02_425006 353 8 S. S. NNP 10_1101-2021_01_02_425006 353 9 K. K. NNP 10_1101-2021_01_02_425006 353 10 Eisenbart Eisenbart NNP 10_1101-2021_01_02_425006 353 11 , , , 10_1101-2021_01_02_425006 353 12 K. K. NNP 10_1101-2021_01_02_425006 353 13 U. U. NNP 10_1101-2021_01_02_425006 353 14 Forstner Forstner NNP 10_1101-2021_01_02_425006 353 15 , , , 10_1101-2021_01_02_425006 353 16 A. A. NNP 10_1101-2021_01_02_425006 353 17 Zhang Zhang NNP 10_1101-2021_01_02_425006 353 18 , , , 10_1101-2021_01_02_425006 353 19 A. A. NNP 10_1101-2021_01_02_425006 353 20 Herbig Herbig NNP 10_1101-2021_01_02_425006 353 21 , , , 10_1101-2021_01_02_425006 353 22 K. K. NNP 10_1101-2021_01_02_425006 353 23 Nieselt Nieselt NNP 10_1101-2021_01_02_425006 353 24 , , , 10_1101-2021_01_02_425006 353 25 C. C. NNP 10_1101-2021_01_02_425006 353 26 480 480 CD 10_1101-2021_01_02_425006 353 27 M. M. NNP 10_1101-2021_01_02_425006 353 28 Sharma Sharma NNP 10_1101-2021_01_02_425006 353 29 , , , 10_1101-2021_01_02_425006 353 30 G. G. NNP 10_1101-2021_01_02_425006 353 31 Storz Storz NNP 10_1101-2021_01_02_425006 353 32 , , , 10_1101-2021_01_02_425006 353 33 Global global JJ 10_1101-2021_01_02_425006 353 34 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 353 35 start start NN 10_1101-2021_01_02_425006 353 36 site site NN 10_1101-2021_01_02_425006 353 37 mapping mapping NN 10_1101-2021_01_02_425006 353 38 using use VBG 10_1101-2021_01_02_425006 353 39 differential differential NN 10_1101-2021_01_02_425006 353 40 RNA RNA NNP 10_1101-2021_01_02_425006 353 41 sequencing sequence VBG 10_1101-2021_01_02_425006 353 42 481 481 CD 10_1101-2021_01_02_425006 353 43 reveals reveal NNS 10_1101-2021_01_02_425006 353 44 novel novel JJ 10_1101-2021_01_02_425006 353 45 antisense antisense NN 10_1101-2021_01_02_425006 353 46 RNAs rna NNS 10_1101-2021_01_02_425006 353 47 in in IN 10_1101-2021_01_02_425006 353 48 Escherichia Escherichia NNP 10_1101-2021_01_02_425006 353 49 coli coli NNS 10_1101-2021_01_02_425006 353 50 . . . 10_1101-2021_01_02_425006 354 1 J. J. NNP 10_1101-2021_01_02_425006 354 2 Bacteriol Bacteriol NNP 10_1101-2021_01_02_425006 354 3 . . . 10_1101-2021_01_02_425006 355 1 197 197 CD 10_1101-2021_01_02_425006 355 2 , , , 10_1101-2021_01_02_425006 355 3 18 18 CD 10_1101-2021_01_02_425006 355 4 - - SYM 10_1101-2021_01_02_425006 355 5 28 28 CD 10_1101-2021_01_02_425006 355 6 ( ( -LRB- 10_1101-2021_01_02_425006 355 7 2015 2015 CD 10_1101-2021_01_02_425006 355 8 ) ) -RRB- 10_1101-2021_01_02_425006 355 9 . . . 10_1101-2021_01_02_425006 356 1 482 482 CD 10_1101-2021_01_02_425006 356 2 15 15 CD 10_1101-2021_01_02_425006 356 3 . . . 10_1101-2021_01_02_425006 357 1 T. T. NNP 10_1101-2021_01_02_425006 357 2 Bischler Bischler NNP 10_1101-2021_01_02_425006 357 3 , , , 10_1101-2021_01_02_425006 357 4 H. H. NNP 10_1101-2021_01_02_425006 357 5 S. S. NNP 10_1101-2021_01_02_425006 357 6 Tan Tan NNP 10_1101-2021_01_02_425006 357 7 , , , 10_1101-2021_01_02_425006 357 8 K. K. NNP 10_1101-2021_01_02_425006 357 9 Nieselt Nieselt NNP 10_1101-2021_01_02_425006 357 10 , , , 10_1101-2021_01_02_425006 357 11 C. C. NNP 10_1101-2021_01_02_425006 357 12 M. M. NNP 10_1101-2021_01_02_425006 357 13 Sharma Sharma NNP 10_1101-2021_01_02_425006 357 14 , , , 10_1101-2021_01_02_425006 357 15 Differential Differential NNP 10_1101-2021_01_02_425006 357 16 RNA RNA NNP 10_1101-2021_01_02_425006 357 17 - - HYPH 10_1101-2021_01_02_425006 357 18 seq seq NNP 10_1101-2021_01_02_425006 357 19 ( ( -LRB- 10_1101-2021_01_02_425006 357 20 dRNA drna NN 10_1101-2021_01_02_425006 357 21 - - HYPH 10_1101-2021_01_02_425006 357 22 seq seq NN 10_1101-2021_01_02_425006 357 23 ) ) -RRB- 10_1101-2021_01_02_425006 357 24 for for IN 10_1101-2021_01_02_425006 357 25 annotation annotation NN 10_1101-2021_01_02_425006 357 26 483 483 CD 10_1101-2021_01_02_425006 357 27 of of IN 10_1101-2021_01_02_425006 357 28 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 357 29 start start NN 10_1101-2021_01_02_425006 357 30 sites site NNS 10_1101-2021_01_02_425006 357 31 and and CC 10_1101-2021_01_02_425006 357 32 small small JJ 10_1101-2021_01_02_425006 357 33 RNAs rna NNS 10_1101-2021_01_02_425006 357 34 in in IN 10_1101-2021_01_02_425006 357 35 Helicobacter Helicobacter NNP 10_1101-2021_01_02_425006 357 36 pylori pylori NN 10_1101-2021_01_02_425006 357 37 . . . 10_1101-2021_01_02_425006 358 1 Methods method NNS 10_1101-2021_01_02_425006 358 2 86 86 CD 10_1101-2021_01_02_425006 358 3 , , , 10_1101-2021_01_02_425006 358 4 89 89 CD 10_1101-2021_01_02_425006 358 5 - - SYM 10_1101-2021_01_02_425006 358 6 101 101 CD 10_1101-2021_01_02_425006 358 7 ( ( -LRB- 10_1101-2021_01_02_425006 358 8 2015 2015 CD 10_1101-2021_01_02_425006 358 9 ) ) -RRB- 10_1101-2021_01_02_425006 358 10 . . . 10_1101-2021_01_02_425006 359 1 484 484 CD 10_1101-2021_01_02_425006 359 2 16 16 CD 10_1101-2021_01_02_425006 359 3 . . . 10_1101-2021_01_02_425006 360 1 D. D. NNP 10_1101-2021_01_02_425006 360 2 Dar Dar NNP 10_1101-2021_01_02_425006 360 3 , , , 10_1101-2021_01_02_425006 360 4 M. M. NNP 10_1101-2021_01_02_425006 360 5 Shamir Shamir NNP 10_1101-2021_01_02_425006 360 6 , , , 10_1101-2021_01_02_425006 360 7 J. J. NNP 10_1101-2021_01_02_425006 360 8 Mellin Mellin NNP 10_1101-2021_01_02_425006 360 9 , , , 10_1101-2021_01_02_425006 360 10 M. M. NNP 10_1101-2021_01_02_425006 360 11 Koutero Koutero NNP 10_1101-2021_01_02_425006 360 12 , , , 10_1101-2021_01_02_425006 360 13 N. N. NNP 10_1101-2021_01_02_425006 360 14 Stern Stern NNP 10_1101-2021_01_02_425006 360 15 - - HYPH 10_1101-2021_01_02_425006 360 16 Ginossar Ginossar NNP 10_1101-2021_01_02_425006 360 17 , , , 10_1101-2021_01_02_425006 360 18 P. P. NNP 10_1101-2021_01_02_425006 360 19 Cossart Cossart NNP 10_1101-2021_01_02_425006 360 20 , , , 10_1101-2021_01_02_425006 360 21 R. R. NNP 10_1101-2021_01_02_425006 360 22 Sorek Sorek NNP 10_1101-2021_01_02_425006 360 23 , , , 10_1101-2021_01_02_425006 360 24 Term Term NNP 10_1101-2021_01_02_425006 360 25 - - HYPH 10_1101-2021_01_02_425006 360 26 seq seq NN 10_1101-2021_01_02_425006 360 27 485 485 CD 10_1101-2021_01_02_425006 360 28 reveals reveal VBZ 10_1101-2021_01_02_425006 360 29 abundant abundant JJ 10_1101-2021_01_02_425006 360 30 ribo ribo NN 10_1101-2021_01_02_425006 360 31 - - HYPH 10_1101-2021_01_02_425006 360 32 regulation regulation NN 10_1101-2021_01_02_425006 360 33 of of IN 10_1101-2021_01_02_425006 360 34 antibiotics antibiotic NNS 10_1101-2021_01_02_425006 360 35 resistance resistance NN 10_1101-2021_01_02_425006 360 36 in in IN 10_1101-2021_01_02_425006 360 37 bacteria bacteria NNS 10_1101-2021_01_02_425006 360 38 . . . 10_1101-2021_01_02_425006 361 1 Science science NN 10_1101-2021_01_02_425006 361 2 352 352 CD 10_1101-2021_01_02_425006 361 3 , , , 10_1101-2021_01_02_425006 361 4 6282 6282 CD 10_1101-2021_01_02_425006 361 5 ( ( -LRB- 10_1101-2021_01_02_425006 361 6 2016 2016 CD 10_1101-2021_01_02_425006 361 7 ) ) -RRB- 10_1101-2021_01_02_425006 361 8 . . . 10_1101-2021_01_02_425006 362 1 486 486 CD 10_1101-2021_01_02_425006 362 2 17 17 CD 10_1101-2021_01_02_425006 362 3 . . . 10_1101-2021_01_02_425006 363 1 J. J. NNP 10_1101-2021_01_02_425006 363 2 Clauwaert Clauwaert NNP 10_1101-2021_01_02_425006 363 3 , , , 10_1101-2021_01_02_425006 363 4 G. G. NNP 10_1101-2021_01_02_425006 363 5 Menschaert Menschaert NNP 10_1101-2021_01_02_425006 363 6 , , , 10_1101-2021_01_02_425006 363 7 W. W. NNP 10_1101-2021_01_02_425006 363 8 Waegeman Waegeman NNP 10_1101-2021_01_02_425006 363 9 , , , 10_1101-2021_01_02_425006 363 10 An an DT 10_1101-2021_01_02_425006 363 11 in in IN 10_1101-2021_01_02_425006 363 12 - - HYPH 10_1101-2021_01_02_425006 363 13 depth depth NN 10_1101-2021_01_02_425006 363 14 evaluation evaluation NN 10_1101-2021_01_02_425006 363 15 of of IN 10_1101-2021_01_02_425006 363 16 annotated annotate VBN 10_1101-2021_01_02_425006 363 17 transcription transcription NN 10_1101-2021_01_02_425006 363 18 487 487 CD 10_1101-2021_01_02_425006 363 19 start start NN 10_1101-2021_01_02_425006 363 20 sites site NNS 10_1101-2021_01_02_425006 363 21 in in IN 10_1101-2021_01_02_425006 363 22 E. E. NNP 10_1101-2021_01_02_425006 363 23 coli coli NNS 10_1101-2021_01_02_425006 363 24 using use VBG 10_1101-2021_01_02_425006 363 25 deep deep JJ 10_1101-2021_01_02_425006 363 26 learning learning NN 10_1101-2021_01_02_425006 363 27 . . . 10_1101-2021_01_02_425006 364 1 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 364 2 , , , 10_1101-2021_01_02_425006 364 3 doi doi XX 10_1101-2021_01_02_425006 364 4 : : : 10_1101-2021_01_02_425006 364 5 https://doi.org/10.1101/2020.03.16.993501 https://doi.org/10.1101/2020.03.16.993501 ADD 10_1101-2021_01_02_425006 364 6 , , , 10_1101-2021_01_02_425006 364 7 488 488 CD 10_1101-2021_01_02_425006 364 8 4 4 CD 10_1101-2021_01_02_425006 364 9 November November NNP 10_1101-2021_01_02_425006 364 10 2020 2020 CD 10_1101-2021_01_02_425006 364 11 , , , 10_1101-2021_01_02_425006 364 12 pre pre NN 10_1101-2021_01_02_425006 364 13 - - NN 10_1101-2021_01_02_425006 364 14 print print NN 10_1101-2021_01_02_425006 364 15 : : : 10_1101-2021_01_02_425006 364 16 not not RB 10_1101-2021_01_02_425006 364 17 peer peer NN 10_1101-2021_01_02_425006 364 18 - - HYPH 10_1101-2021_01_02_425006 364 19 reviewed review VBN 10_1101-2021_01_02_425006 364 20 . . . 10_1101-2021_01_02_425006 365 1 ( ( -LRB- 10_1101-2021_01_02_425006 365 2 2020 2020 CD 10_1101-2021_01_02_425006 365 3 ) ) -RRB- 10_1101-2021_01_02_425006 365 4 . . . 10_1101-2021_01_02_425006 366 1 489 489 CD 10_1101-2021_01_02_425006 366 2 18 18 CD 10_1101-2021_01_02_425006 366 3 . . . 10_1101-2021_01_02_425006 367 1 S. S. NNP 10_1101-2021_01_02_425006 367 2 Goodwin Goodwin NNP 10_1101-2021_01_02_425006 367 3 , , , 10_1101-2021_01_02_425006 367 4 J. J. NNP 10_1101-2021_01_02_425006 367 5 D. D. NNP 10_1101-2021_01_02_425006 367 6 Mcpherson Mcpherson NNP 10_1101-2021_01_02_425006 367 7 , , , 10_1101-2021_01_02_425006 367 8 W. W. NNP 10_1101-2021_01_02_425006 367 9 R. R. NNP 10_1101-2021_01_02_425006 367 10 Mccombie Mccombie NNP 10_1101-2021_01_02_425006 367 11 , , , 10_1101-2021_01_02_425006 367 12 Coming come VBG 10_1101-2021_01_02_425006 367 13 of of IN 10_1101-2021_01_02_425006 367 14 age age NN 10_1101-2021_01_02_425006 367 15 : : : 10_1101-2021_01_02_425006 367 16 ten ten CD 10_1101-2021_01_02_425006 367 17 years year NNS 10_1101-2021_01_02_425006 367 18 of of IN 10_1101-2021_01_02_425006 367 19 next next JJ 10_1101-2021_01_02_425006 367 20 - - HYPH 10_1101-2021_01_02_425006 367 21 generation generation NN 10_1101-2021_01_02_425006 367 22 490 490 CD 10_1101-2021_01_02_425006 367 23 sequencing sequencing NN 10_1101-2021_01_02_425006 367 24 technologies technology NNS 10_1101-2021_01_02_425006 367 25 . . . 10_1101-2021_01_02_425006 368 1 Nat Nat NNP 10_1101-2021_01_02_425006 368 2 . . . 10_1101-2021_01_02_425006 369 1 Rev. Rev. NNP 10_1101-2021_01_02_425006 370 1 Genet Genet NNP 10_1101-2021_01_02_425006 370 2 . . . 10_1101-2021_01_02_425006 371 1 17 17 CD 10_1101-2021_01_02_425006 371 2 , , , 10_1101-2021_01_02_425006 371 3 333 333 CD 10_1101-2021_01_02_425006 371 4 - - HYPH 10_1101-2021_01_02_425006 371 5 351 351 CD 10_1101-2021_01_02_425006 371 6 ( ( -LRB- 10_1101-2021_01_02_425006 371 7 2016 2016 CD 10_1101-2021_01_02_425006 371 8 ) ) -RRB- 10_1101-2021_01_02_425006 371 9 . . . 10_1101-2021_01_02_425006 372 1 491 491 CD 10_1101-2021_01_02_425006 372 2 19 19 CD 10_1101-2021_01_02_425006 372 3 . . . 10_1101-2021_01_02_425006 373 1 A. A. NNP 10_1101-2021_01_02_425006 373 2 Santos Santos NNP 10_1101-2021_01_02_425006 373 3 - - HYPH 10_1101-2021_01_02_425006 373 4 Zavaleta Zavaleta NNP 10_1101-2021_01_02_425006 373 5 , , , 10_1101-2021_01_02_425006 373 6 H. H. NNP 10_1101-2021_01_02_425006 373 7 Salgado Salgado NNP 10_1101-2021_01_02_425006 373 8 , , , 10_1101-2021_01_02_425006 373 9 S. S. NNP 10_1101-2021_01_02_425006 373 10 Gama Gama NNP 10_1101-2021_01_02_425006 373 11 - - HYPH 10_1101-2021_01_02_425006 373 12 Castro Castro NNP 10_1101-2021_01_02_425006 373 13 , , , 10_1101-2021_01_02_425006 373 14 M. M. NNP 10_1101-2021_01_02_425006 373 15 Sánchez Sánchez NNP 10_1101-2021_01_02_425006 373 16 - - HYPH 10_1101-2021_01_02_425006 373 17 Pérez Pérez NNP 10_1101-2021_01_02_425006 373 18 , , , 10_1101-2021_01_02_425006 373 19 L. L. NNP 10_1101-2021_01_02_425006 373 20 Gómez Gómez NNP 10_1101-2021_01_02_425006 373 21 - - HYPH 10_1101-2021_01_02_425006 373 22 Romero Romero NNP 10_1101-2021_01_02_425006 373 23 , , , 10_1101-2021_01_02_425006 373 24 D. D. NNP 10_1101-2021_01_02_425006 373 25 492 492 CD 10_1101-2021_01_02_425006 373 26 Ledezma Ledezma NNP 10_1101-2021_01_02_425006 373 27 - - HYPH 10_1101-2021_01_02_425006 373 28 Tejeida Tejeida NNP 10_1101-2021_01_02_425006 373 29 , , , 10_1101-2021_01_02_425006 373 30 J. J. NNP 10_1101-2021_01_02_425006 373 31 S. S. NNP 10_1101-2021_01_02_425006 373 32 García García NNP 10_1101-2021_01_02_425006 373 33 - - HYPH 10_1101-2021_01_02_425006 373 34 Sotelo Sotelo NNP 10_1101-2021_01_02_425006 373 35 , , , 10_1101-2021_01_02_425006 373 36 K. K. NNP 10_1101-2021_01_02_425006 373 37 Alquicira Alquicira NNP 10_1101-2021_01_02_425006 373 38 - - HYPH 10_1101-2021_01_02_425006 373 39 Hernández Hernández NNP 10_1101-2021_01_02_425006 373 40 , , , 10_1101-2021_01_02_425006 373 41 L. L. NNP 10_1101-2021_01_02_425006 373 42 J. J. NNP 10_1101-2021_01_02_425006 373 43 Muñiz Muñiz NNP 10_1101-2021_01_02_425006 373 44 - - HYPH 10_1101-2021_01_02_425006 373 45 Rascado Rascado NNP 10_1101-2021_01_02_425006 373 46 , , , 10_1101-2021_01_02_425006 373 47 P. P. NNP 10_1101-2021_01_02_425006 373 48 Peña-493 peña-493 NN 10_1101-2021_01_02_425006 373 49 Loredo Loredo NNP 10_1101-2021_01_02_425006 373 50 , , , 10_1101-2021_01_02_425006 373 51 RegulonDB regulondb IN 10_1101-2021_01_02_425006 373 52 v v NN 10_1101-2021_01_02_425006 373 53 10.5 10.5 CD 10_1101-2021_01_02_425006 373 54 : : : 10_1101-2021_01_02_425006 373 55 tackling tackle VBG 10_1101-2021_01_02_425006 373 56 challenges challenge NNS 10_1101-2021_01_02_425006 373 57 to to TO 10_1101-2021_01_02_425006 373 58 unify unify VB 10_1101-2021_01_02_425006 373 59 classic classic JJ 10_1101-2021_01_02_425006 373 60 and and CC 10_1101-2021_01_02_425006 373 61 high high JJ 10_1101-2021_01_02_425006 373 62 throughput throughput NN 10_1101-2021_01_02_425006 373 63 knowledge knowledge NN 10_1101-2021_01_02_425006 373 64 494 494 CD 10_1101-2021_01_02_425006 373 65 of of IN 10_1101-2021_01_02_425006 373 66 gene gene NN 10_1101-2021_01_02_425006 373 67 regulation regulation NN 10_1101-2021_01_02_425006 373 68 in in IN 10_1101-2021_01_02_425006 373 69 E. E. NNP 10_1101-2021_01_02_425006 373 70 coli coli NNS 10_1101-2021_01_02_425006 373 71 K-12 k-12 CD 10_1101-2021_01_02_425006 373 72 . . . 10_1101-2021_01_02_425006 374 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 374 2 Acids Acids NNPS 10_1101-2021_01_02_425006 374 3 Res Res NNP 10_1101-2021_01_02_425006 374 4 . . . 10_1101-2021_01_02_425006 375 1 47 47 CD 10_1101-2021_01_02_425006 375 2 , , , 10_1101-2021_01_02_425006 375 3 D212-D220 D212-D220 NNP 10_1101-2021_01_02_425006 375 4 ( ( -LRB- 10_1101-2021_01_02_425006 375 5 2018 2018 CD 10_1101-2021_01_02_425006 375 6 ) ) -RRB- 10_1101-2021_01_02_425006 375 7 . . . 10_1101-2021_01_02_425006 376 1 495 495 CD 10_1101-2021_01_02_425006 376 2 20 20 CD 10_1101-2021_01_02_425006 376 3 . . . 10_1101-2021_01_02_425006 377 1 N. N. NNP 10_1101-2021_01_02_425006 377 2 Sierro Sierro NNP 10_1101-2021_01_02_425006 377 3 , , , 10_1101-2021_01_02_425006 377 4 Y. Y. NNP 10_1101-2021_01_02_425006 377 5 Makita Makita NNP 10_1101-2021_01_02_425006 377 6 , , , 10_1101-2021_01_02_425006 377 7 M. M. NNP 10_1101-2021_01_02_425006 377 8 J. J. NNP 10_1101-2021_01_02_425006 377 9 L. L. NNP 10_1101-2021_01_02_425006 377 10 De De NNP 10_1101-2021_01_02_425006 377 11 Hoon Hoon NNP 10_1101-2021_01_02_425006 377 12 , , , 10_1101-2021_01_02_425006 377 13 K. K. NNP 10_1101-2021_01_02_425006 377 14 Nakai Nakai NNP 10_1101-2021_01_02_425006 377 15 , , , 10_1101-2021_01_02_425006 377 16 DBTBS DBTBS NNP 10_1101-2021_01_02_425006 377 17 : : : 10_1101-2021_01_02_425006 377 18 a a DT 10_1101-2021_01_02_425006 377 19 database database NN 10_1101-2021_01_02_425006 377 20 of of IN 10_1101-2021_01_02_425006 377 21 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 377 22 regulation regulation NN 10_1101-2021_01_02_425006 377 23 496 496 CD 10_1101-2021_01_02_425006 377 24 in in IN 10_1101-2021_01_02_425006 377 25 Bacillus bacillus NN 10_1101-2021_01_02_425006 377 26 subtilis subtili NNS 10_1101-2021_01_02_425006 377 27 containing contain VBG 10_1101-2021_01_02_425006 377 28 upstream upstream JJ 10_1101-2021_01_02_425006 377 29 intergenic intergenic JJ 10_1101-2021_01_02_425006 377 30 conservation conservation NN 10_1101-2021_01_02_425006 377 31 information information NN 10_1101-2021_01_02_425006 377 32 . . . 10_1101-2021_01_02_425006 378 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 378 2 Acids Acids NNPS 10_1101-2021_01_02_425006 378 3 Res Res NNP 10_1101-2021_01_02_425006 378 4 . . . 10_1101-2021_01_02_425006 379 1 497 497 CD 10_1101-2021_01_02_425006 379 2 .CC .CC : 10_1101-2021_01_02_425006 379 3 - - HYPH 10_1101-2021_01_02_425006 379 4 BY by IN 10_1101-2021_01_02_425006 379 5 - - HYPH 10_1101-2021_01_02_425006 379 6 NC NC NNP 10_1101-2021_01_02_425006 379 7 - - HYPH 10_1101-2021_01_02_425006 379 8 ND ND NNP 10_1101-2021_01_02_425006 379 9 4.0 4.0 CD 10_1101-2021_01_02_425006 379 10 International International NNP 10_1101-2021_01_02_425006 379 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 379 12 under under IN 10_1101-2021_01_02_425006 379 13 a a DT 10_1101-2021_01_02_425006 379 14 ( ( -LRB- 10_1101-2021_01_02_425006 379 15 which which WDT 10_1101-2021_01_02_425006 379 16 was be VBD 10_1101-2021_01_02_425006 379 17 not not RB 10_1101-2021_01_02_425006 379 18 certified certify VBN 10_1101-2021_01_02_425006 379 19 by by IN 10_1101-2021_01_02_425006 379 20 peer peer NN 10_1101-2021_01_02_425006 379 21 review review NN 10_1101-2021_01_02_425006 379 22 ) ) -RRB- 10_1101-2021_01_02_425006 379 23 is be VBZ 10_1101-2021_01_02_425006 379 24 the the DT 10_1101-2021_01_02_425006 379 25 author author NN 10_1101-2021_01_02_425006 379 26 / / SYM 10_1101-2021_01_02_425006 379 27 funder funder NN 10_1101-2021_01_02_425006 379 28 , , , 10_1101-2021_01_02_425006 379 29 who who WP 10_1101-2021_01_02_425006 379 30 has have VBZ 10_1101-2021_01_02_425006 379 31 granted grant VBN 10_1101-2021_01_02_425006 379 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 379 33 a a DT 10_1101-2021_01_02_425006 379 34 license license NN 10_1101-2021_01_02_425006 379 35 to to TO 10_1101-2021_01_02_425006 379 36 display display VB 10_1101-2021_01_02_425006 379 37 the the DT 10_1101-2021_01_02_425006 379 38 preprint preprint NN 10_1101-2021_01_02_425006 379 39 in in IN 10_1101-2021_01_02_425006 379 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 379 41 . . . 10_1101-2021_01_02_425006 380 1 It -PRON- PRP 10_1101-2021_01_02_425006 380 2 is be VBZ 10_1101-2021_01_02_425006 380 3 made make VBN 10_1101-2021_01_02_425006 380 4 The the DT 10_1101-2021_01_02_425006 380 5 copyright copyright NN 10_1101-2021_01_02_425006 380 6 holder holder NN 10_1101-2021_01_02_425006 380 7 for for IN 10_1101-2021_01_02_425006 380 8 this this DT 10_1101-2021_01_02_425006 380 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 380 10 version version NN 10_1101-2021_01_02_425006 380 11 posted post VBD 10_1101-2021_01_02_425006 380 12 January January NNP 10_1101-2021_01_02_425006 380 13 6 6 CD 10_1101-2021_01_02_425006 380 14 , , , 10_1101-2021_01_02_425006 380 15 2021 2021 CD 10_1101-2021_01_02_425006 380 16 . . . 10_1101-2021_01_02_425006 380 17 ; ; : 10_1101-2021_01_02_425006 380 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 380 19 : : : 10_1101-2021_01_02_425006 380 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 380 21 preprint preprint NN 10_1101-2021_01_02_425006 380 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 380 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 380 24 29 29 CD 10_1101-2021_01_02_425006 380 25 36 36 CD 10_1101-2021_01_02_425006 380 26 , , , 10_1101-2021_01_02_425006 380 27 93 93 CD 10_1101-2021_01_02_425006 380 28 - - SYM 10_1101-2021_01_02_425006 380 29 96 96 CD 10_1101-2021_01_02_425006 380 30 ( ( -LRB- 10_1101-2021_01_02_425006 380 31 2008 2008 CD 10_1101-2021_01_02_425006 380 32 ) ) -RRB- 10_1101-2021_01_02_425006 380 33 . . . 10_1101-2021_01_02_425006 381 1 498 498 CD 10_1101-2021_01_02_425006 381 2 21 21 CD 10_1101-2021_01_02_425006 381 3 . . . 10_1101-2021_01_02_425006 382 1 P. P. NNP 10_1101-2021_01_02_425006 382 2 S. S. NNP 10_1101-2021_01_02_425006 382 3 Dehal Dehal NNP 10_1101-2021_01_02_425006 382 4 , , , 10_1101-2021_01_02_425006 382 5 M. M. NNP 10_1101-2021_01_02_425006 382 6 P. P. NNP 10_1101-2021_01_02_425006 382 7 Joachimiak Joachimiak NNP 10_1101-2021_01_02_425006 382 8 , , , 10_1101-2021_01_02_425006 382 9 M. M. NNP 10_1101-2021_01_02_425006 382 10 N. N. NNP 10_1101-2021_01_02_425006 382 11 Price Price NNP 10_1101-2021_01_02_425006 382 12 , , , 10_1101-2021_01_02_425006 382 13 J. J. NNP 10_1101-2021_01_02_425006 382 14 T. T. NNP 10_1101-2021_01_02_425006 382 15 Bates Bates NNP 10_1101-2021_01_02_425006 382 16 , , , 10_1101-2021_01_02_425006 382 17 J. J. NNP 10_1101-2021_01_02_425006 382 18 K. K. NNP 10_1101-2021_01_02_425006 382 19 Baumohl Baumohl NNP 10_1101-2021_01_02_425006 382 20 , , , 10_1101-2021_01_02_425006 382 21 C. C. NNP 10_1101-2021_01_02_425006 382 22 Dylan Dylan NNP 10_1101-2021_01_02_425006 382 23 , , , 10_1101-2021_01_02_425006 382 24 G. G. NNP 10_1101-2021_01_02_425006 382 25 D. D. NNP 10_1101-2021_01_02_425006 382 26 Friedland Friedland NNP 10_1101-2021_01_02_425006 382 27 , , , 10_1101-2021_01_02_425006 382 28 499 499 CD 10_1101-2021_01_02_425006 382 29 K. K. NNP 10_1101-2021_01_02_425006 382 30 H. H. NNP 10_1101-2021_01_02_425006 382 31 Huang Huang NNP 10_1101-2021_01_02_425006 382 32 , , , 10_1101-2021_01_02_425006 382 33 K. K. NNP 10_1101-2021_01_02_425006 382 34 Keith Keith NNP 10_1101-2021_01_02_425006 382 35 , , , 10_1101-2021_01_02_425006 382 36 P. P. NNP 10_1101-2021_01_02_425006 382 37 S. S. NNP 10_1101-2021_01_02_425006 382 38 Novichkov Novichkov NNP 10_1101-2021_01_02_425006 382 39 , , , 10_1101-2021_01_02_425006 382 40 MicrobesOnline MicrobesOnline NNP 10_1101-2021_01_02_425006 382 41 : : : 10_1101-2021_01_02_425006 382 42 an an DT 10_1101-2021_01_02_425006 382 43 integrated integrated JJ 10_1101-2021_01_02_425006 382 44 portal portal NN 10_1101-2021_01_02_425006 382 45 for for IN 10_1101-2021_01_02_425006 382 46 comparative comparative JJ 10_1101-2021_01_02_425006 382 47 and and CC 10_1101-2021_01_02_425006 382 48 500 500 CD 10_1101-2021_01_02_425006 382 49 functional functional JJ 10_1101-2021_01_02_425006 382 50 genomics genomic NNS 10_1101-2021_01_02_425006 382 51 . . . 10_1101-2021_01_02_425006 383 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 383 2 Acids Acids NNPS 10_1101-2021_01_02_425006 383 3 Res Res NNP 10_1101-2021_01_02_425006 383 4 . . . 10_1101-2021_01_02_425006 384 1 38 38 CD 10_1101-2021_01_02_425006 384 2 , , , 10_1101-2021_01_02_425006 384 3 D396-D400 D396-D400 NNP 10_1101-2021_01_02_425006 384 4 ( ( -LRB- 10_1101-2021_01_02_425006 384 5 2010 2010 CD 10_1101-2021_01_02_425006 384 6 ) ) -RRB- 10_1101-2021_01_02_425006 384 7 . . . 10_1101-2021_01_02_425006 385 1 501 501 CD 10_1101-2021_01_02_425006 385 2 22 22 CD 10_1101-2021_01_02_425006 385 3 . . . 10_1101-2021_01_02_425006 386 1 H. H. NNP 10_1101-2021_01_02_425006 386 2 Cao Cao NNP 10_1101-2021_01_02_425006 386 3 , , , 10_1101-2021_01_02_425006 386 4 Q. Q. NNP 10_1101-2021_01_02_425006 386 5 Ma Ma NNP 10_1101-2021_01_02_425006 386 6 , , , 10_1101-2021_01_02_425006 386 7 X. X. NNP 10_1101-2021_01_02_425006 386 8 Chen Chen NNP 10_1101-2021_01_02_425006 386 9 , , , 10_1101-2021_01_02_425006 386 10 Y. Y. NNP 10_1101-2021_01_02_425006 386 11 Xu Xu NNP 10_1101-2021_01_02_425006 386 12 , , , 10_1101-2021_01_02_425006 386 13 DOOR DOOR NNP 10_1101-2021_01_02_425006 386 14 : : : 10_1101-2021_01_02_425006 386 15 a a DT 10_1101-2021_01_02_425006 386 16 prokaryotic prokaryotic JJ 10_1101-2021_01_02_425006 386 17 operon operon NN 10_1101-2021_01_02_425006 386 18 database database NN 10_1101-2021_01_02_425006 386 19 for for IN 10_1101-2021_01_02_425006 386 20 genome genome JJ 10_1101-2021_01_02_425006 386 21 analyses analysis NNS 10_1101-2021_01_02_425006 386 22 and and CC 10_1101-2021_01_02_425006 386 23 502 502 CD 10_1101-2021_01_02_425006 386 24 functional functional JJ 10_1101-2021_01_02_425006 386 25 inference inference NN 10_1101-2021_01_02_425006 386 26 . . . 10_1101-2021_01_02_425006 387 1 Brief brief JJ 10_1101-2021_01_02_425006 387 2 . . . 10_1101-2021_01_02_425006 388 1 Bioinform bioinform NN 10_1101-2021_01_02_425006 388 2 . . . 10_1101-2021_01_02_425006 389 1 20 20 CD 10_1101-2021_01_02_425006 389 2 , , , 10_1101-2021_01_02_425006 389 3 1568 1568 CD 10_1101-2021_01_02_425006 389 4 - - SYM 10_1101-2021_01_02_425006 389 5 1577 1577 CD 10_1101-2021_01_02_425006 389 6 ( ( -LRB- 10_1101-2021_01_02_425006 389 7 2019 2019 CD 10_1101-2021_01_02_425006 389 8 ) ) -RRB- 10_1101-2021_01_02_425006 389 9 . . . 10_1101-2021_01_02_425006 390 1 503 503 CD 10_1101-2021_01_02_425006 390 2 23 23 CD 10_1101-2021_01_02_425006 390 3 . . . 10_1101-2021_01_02_425006 391 1 X. X. NNP 10_1101-2021_01_02_425006 391 2 Mao Mao NNP 10_1101-2021_01_02_425006 391 3 , , , 10_1101-2021_01_02_425006 391 4 Q. Q. NNP 10_1101-2021_01_02_425006 391 5 Ma Ma NNP 10_1101-2021_01_02_425006 391 6 , , , 10_1101-2021_01_02_425006 391 7 C. C. NNP 10_1101-2021_01_02_425006 391 8 Zhou Zhou NNP 10_1101-2021_01_02_425006 391 9 , , , 10_1101-2021_01_02_425006 391 10 X. X. NNP 10_1101-2021_01_02_425006 391 11 Chen Chen NNP 10_1101-2021_01_02_425006 391 12 , , , 10_1101-2021_01_02_425006 391 13 H. H. NNP 10_1101-2021_01_02_425006 391 14 Zhang Zhang NNP 10_1101-2021_01_02_425006 391 15 , , , 10_1101-2021_01_02_425006 391 16 J. J. NNP 10_1101-2021_01_02_425006 391 17 Yang Yang NNP 10_1101-2021_01_02_425006 391 18 , , , 10_1101-2021_01_02_425006 391 19 F. F. NNP 10_1101-2021_01_02_425006 391 20 Mao Mao NNP 10_1101-2021_01_02_425006 391 21 , , , 10_1101-2021_01_02_425006 391 22 W. W. NNP 10_1101-2021_01_02_425006 391 23 Lai Lai NNP 10_1101-2021_01_02_425006 391 24 , , , 10_1101-2021_01_02_425006 391 25 Y. Y. NNP 10_1101-2021_01_02_425006 391 26 Xu Xu NNP 10_1101-2021_01_02_425006 391 27 , , , 10_1101-2021_01_02_425006 391 28 DOOR DOOR NNP 10_1101-2021_01_02_425006 391 29 2.0 2.0 CD 10_1101-2021_01_02_425006 391 30 : : : 10_1101-2021_01_02_425006 391 31 presenting present VBG 10_1101-2021_01_02_425006 391 32 504 504 CD 10_1101-2021_01_02_425006 391 33 operons operon NNS 10_1101-2021_01_02_425006 391 34 and and CC 10_1101-2021_01_02_425006 391 35 their -PRON- PRP$ 10_1101-2021_01_02_425006 391 36 functions function NNS 10_1101-2021_01_02_425006 391 37 through through IN 10_1101-2021_01_02_425006 391 38 dynamic dynamic JJ 10_1101-2021_01_02_425006 391 39 and and CC 10_1101-2021_01_02_425006 391 40 integrated integrated JJ 10_1101-2021_01_02_425006 391 41 views view NNS 10_1101-2021_01_02_425006 391 42 . . . 10_1101-2021_01_02_425006 392 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 392 2 Acids Acids NNPS 10_1101-2021_01_02_425006 392 3 Res Res NNP 10_1101-2021_01_02_425006 392 4 . . . 10_1101-2021_01_02_425006 393 1 42 42 CD 10_1101-2021_01_02_425006 393 2 , , , 10_1101-2021_01_02_425006 393 3 D654 D654 NNP 10_1101-2021_01_02_425006 393 4 - - HYPH 10_1101-2021_01_02_425006 393 5 505 505 CD 10_1101-2021_01_02_425006 393 6 D659 d659 NN 10_1101-2021_01_02_425006 393 7 ( ( -LRB- 10_1101-2021_01_02_425006 393 8 2013 2013 CD 10_1101-2021_01_02_425006 393 9 ) ) -RRB- 10_1101-2021_01_02_425006 393 10 . . . 10_1101-2021_01_02_425006 394 1 506 506 CD 10_1101-2021_01_02_425006 394 2 24 24 CD 10_1101-2021_01_02_425006 394 3 . . . 10_1101-2021_01_02_425006 395 1 K. K. NNP 10_1101-2021_01_02_425006 395 2 Chetal Chetal NNP 10_1101-2021_01_02_425006 395 3 , , , 10_1101-2021_01_02_425006 395 4 S. S. NNP 10_1101-2021_01_02_425006 395 5 C. C. NNP 10_1101-2021_01_02_425006 395 6 Janga Janga NNP 10_1101-2021_01_02_425006 395 7 , , , 10_1101-2021_01_02_425006 395 8 OperomeDB OperomeDB NNP 10_1101-2021_01_02_425006 395 9 : : : 10_1101-2021_01_02_425006 395 10 A a DT 10_1101-2021_01_02_425006 395 11 Database Database NNP 10_1101-2021_01_02_425006 395 12 of of IN 10_1101-2021_01_02_425006 395 13 Condition Condition NNP 10_1101-2021_01_02_425006 395 14 - - HYPH 10_1101-2021_01_02_425006 395 15 Specific Specific NNP 10_1101-2021_01_02_425006 395 16 Transcription Transcription NNP 10_1101-2021_01_02_425006 395 17 Units Units NNPS 10_1101-2021_01_02_425006 395 18 in in IN 10_1101-2021_01_02_425006 395 19 507 507 CD 10_1101-2021_01_02_425006 395 20 Prokaryotic Prokaryotic NNP 10_1101-2021_01_02_425006 395 21 Genomes Genomes NNPS 10_1101-2021_01_02_425006 395 22 . . . 10_1101-2021_01_02_425006 396 1 Biomed Biomed NNP 10_1101-2021_01_02_425006 396 2 Research Research NNP 10_1101-2021_01_02_425006 396 3 International International NNP 10_1101-2021_01_02_425006 396 4 2015 2015 CD 10_1101-2021_01_02_425006 396 5 , , , 10_1101-2021_01_02_425006 396 6 1 1 CD 10_1101-2021_01_02_425006 396 7 - - SYM 10_1101-2021_01_02_425006 396 8 10 10 CD 10_1101-2021_01_02_425006 396 9 ( ( -LRB- 10_1101-2021_01_02_425006 396 10 2015 2015 CD 10_1101-2021_01_02_425006 396 11 ) ) -RRB- 10_1101-2021_01_02_425006 396 12 . . . 10_1101-2021_01_02_425006 397 1 508 508 CD 10_1101-2021_01_02_425006 397 2 25 25 CD 10_1101-2021_01_02_425006 397 3 . . . 10_1101-2021_01_02_425006 398 1 J. J. NNP 10_1101-2021_01_02_425006 398 2 Yang Yang NNP 10_1101-2021_01_02_425006 398 3 , , , 10_1101-2021_01_02_425006 398 4 X. X. NNP 10_1101-2021_01_02_425006 398 5 Chen Chen NNP 10_1101-2021_01_02_425006 398 6 , , , 10_1101-2021_01_02_425006 398 7 A. A. NNP 10_1101-2021_01_02_425006 398 8 Mcdermaid Mcdermaid NNP 10_1101-2021_01_02_425006 398 9 , , , 10_1101-2021_01_02_425006 398 10 Q. Q. NNP 10_1101-2021_01_02_425006 398 11 Ma Ma NNP 10_1101-2021_01_02_425006 398 12 , , , 10_1101-2021_01_02_425006 398 13 DMINDA DMINDA NNP 10_1101-2021_01_02_425006 398 14 2.0 2.0 CD 10_1101-2021_01_02_425006 398 15 : : : 10_1101-2021_01_02_425006 398 16 integrated integrate VBN 10_1101-2021_01_02_425006 398 17 and and CC 10_1101-2021_01_02_425006 398 18 systematic systematic JJ 10_1101-2021_01_02_425006 398 19 views view NNS 10_1101-2021_01_02_425006 398 20 of of IN 10_1101-2021_01_02_425006 398 21 509 509 CD 10_1101-2021_01_02_425006 398 22 regulatory regulatory JJ 10_1101-2021_01_02_425006 398 23 DNA dna NN 10_1101-2021_01_02_425006 398 24 motif motif NN 10_1101-2021_01_02_425006 398 25 identification identification NN 10_1101-2021_01_02_425006 398 26 and and CC 10_1101-2021_01_02_425006 398 27 analyses analysis NNS 10_1101-2021_01_02_425006 398 28 . . . 10_1101-2021_01_02_425006 399 1 Bioinformatics Bioinformatics NNP 10_1101-2021_01_02_425006 399 2 33 33 CD 10_1101-2021_01_02_425006 399 3 , , , 10_1101-2021_01_02_425006 399 4 2586 2586 CD 10_1101-2021_01_02_425006 399 5 - - SYM 10_1101-2021_01_02_425006 399 6 2588 2588 CD 10_1101-2021_01_02_425006 399 7 ( ( -LRB- 10_1101-2021_01_02_425006 399 8 2017 2017 CD 10_1101-2021_01_02_425006 399 9 ) ) -RRB- 10_1101-2021_01_02_425006 399 10 . . . 10_1101-2021_01_02_425006 400 1 510 510 CD 10_1101-2021_01_02_425006 400 2 26 26 CD 10_1101-2021_01_02_425006 400 3 . . . 10_1101-2021_01_02_425006 401 1 T. T. NNP 10_1101-2021_01_02_425006 401 2 Blanca Blanca NNP 10_1101-2021_01_02_425006 401 3 , , , 10_1101-2021_01_02_425006 401 4 C. C. NNP 10_1101-2021_01_02_425006 401 5 Ricardo Ricardo NNP 10_1101-2021_01_02_425006 401 6 , , , 10_1101-2021_01_02_425006 401 7 C. C. NNP 10_1101-2021_01_02_425006 401 8 E. E. NNP 10_1101-2021_01_02_425006 401 9 Martinez Martinez NNP 10_1101-2021_01_02_425006 401 10 - - HYPH 10_1101-2021_01_02_425006 401 11 Guerrero Guerrero NNP 10_1101-2021_01_02_425006 401 12 , , , 10_1101-2021_01_02_425006 401 13 M. M. NNP 10_1101-2021_01_02_425006 401 14 Enrique Enrique NNP 10_1101-2021_01_02_425006 401 15 , , , 10_1101-2021_01_02_425006 401 16 ProOpDB ProOpDB NNP 10_1101-2021_01_02_425006 401 17 : : : 10_1101-2021_01_02_425006 401 18 Prokaryotic Prokaryotic NNP 10_1101-2021_01_02_425006 401 19 Operon Operon NNP 10_1101-2021_01_02_425006 401 20 511 511 CD 10_1101-2021_01_02_425006 401 21 DataBase DataBase NNP 10_1101-2021_01_02_425006 401 22 . . . 10_1101-2021_01_02_425006 402 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 402 2 Acids Acids NNPS 10_1101-2021_01_02_425006 402 3 Res Res NNP 10_1101-2021_01_02_425006 402 4 . . . 10_1101-2021_01_02_425006 403 1 40 40 CD 10_1101-2021_01_02_425006 403 2 , , , 10_1101-2021_01_02_425006 403 3 D627-D631 D627-D631 NNP 10_1101-2021_01_02_425006 403 4 ( ( -LRB- 10_1101-2021_01_02_425006 403 5 2012 2012 CD 10_1101-2021_01_02_425006 403 6 ) ) -RRB- 10_1101-2021_01_02_425006 403 7 . . . 10_1101-2021_01_02_425006 404 1 512 512 CD 10_1101-2021_01_02_425006 404 2 27 27 CD 10_1101-2021_01_02_425006 404 3 . . . 10_1101-2021_01_02_425006 405 1 R. R. NNP 10_1101-2021_01_02_425006 405 2 McClure McClure NNP 10_1101-2021_01_02_425006 405 3 , , , 10_1101-2021_01_02_425006 405 4 D. D. NNP 10_1101-2021_01_02_425006 405 5 Balasubramanian Balasubramanian NNP 10_1101-2021_01_02_425006 405 6 , , , 10_1101-2021_01_02_425006 405 7 Y. Y. NNP 10_1101-2021_01_02_425006 406 1 Sun Sun NNP 10_1101-2021_01_02_425006 406 2 , , , 10_1101-2021_01_02_425006 406 3 M. M. NNP 10_1101-2021_01_02_425006 406 4 Bobrovskyy Bobrovskyy NNP 10_1101-2021_01_02_425006 406 5 , , , 10_1101-2021_01_02_425006 406 6 P. P. NNP 10_1101-2021_01_02_425006 406 7 Sumby Sumby NNP 10_1101-2021_01_02_425006 406 8 , , , 10_1101-2021_01_02_425006 406 9 C. C. NNP 10_1101-2021_01_02_425006 406 10 A. A. NNP 10_1101-2021_01_02_425006 406 11 Genco Genco NNP 10_1101-2021_01_02_425006 406 12 , , , 10_1101-2021_01_02_425006 406 13 C. C. NNP 10_1101-2021_01_02_425006 406 14 K. K. NNP 10_1101-2021_01_02_425006 406 15 513 513 CD 10_1101-2021_01_02_425006 406 16 Vanderpool Vanderpool NNP 10_1101-2021_01_02_425006 406 17 , , , 10_1101-2021_01_02_425006 406 18 B. B. NNP 10_1101-2021_01_02_425006 406 19 Tjaden Tjaden NNP 10_1101-2021_01_02_425006 406 20 , , , 10_1101-2021_01_02_425006 406 21 Computational computational JJ 10_1101-2021_01_02_425006 406 22 analysis analysis NN 10_1101-2021_01_02_425006 406 23 of of IN 10_1101-2021_01_02_425006 406 24 bacterial bacterial JJ 10_1101-2021_01_02_425006 406 25 RNA RNA NNP 10_1101-2021_01_02_425006 406 26 - - HYPH 10_1101-2021_01_02_425006 406 27 Seq Seq NNP 10_1101-2021_01_02_425006 406 28 data datum NNS 10_1101-2021_01_02_425006 406 29 . . . 10_1101-2021_01_02_425006 407 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 407 2 Acids Acids NNPS 10_1101-2021_01_02_425006 407 3 Res Res NNP 10_1101-2021_01_02_425006 407 4 . . . 10_1101-2021_01_02_425006 408 1 41 41 CD 10_1101-2021_01_02_425006 408 2 , , , 10_1101-2021_01_02_425006 408 3 514 514 CD 10_1101-2021_01_02_425006 408 4 .CC .cc SYM 10_1101-2021_01_02_425006 408 5 - - HYPH 10_1101-2021_01_02_425006 408 6 BY by IN 10_1101-2021_01_02_425006 408 7 - - HYPH 10_1101-2021_01_02_425006 408 8 NC NC NNP 10_1101-2021_01_02_425006 408 9 - - HYPH 10_1101-2021_01_02_425006 408 10 ND ND NNP 10_1101-2021_01_02_425006 408 11 4.0 4.0 CD 10_1101-2021_01_02_425006 408 12 International International NNP 10_1101-2021_01_02_425006 408 13 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 408 14 under under IN 10_1101-2021_01_02_425006 408 15 a a DT 10_1101-2021_01_02_425006 408 16 ( ( -LRB- 10_1101-2021_01_02_425006 408 17 which which WDT 10_1101-2021_01_02_425006 408 18 was be VBD 10_1101-2021_01_02_425006 408 19 not not RB 10_1101-2021_01_02_425006 408 20 certified certify VBN 10_1101-2021_01_02_425006 408 21 by by IN 10_1101-2021_01_02_425006 408 22 peer peer NN 10_1101-2021_01_02_425006 408 23 review review NN 10_1101-2021_01_02_425006 408 24 ) ) -RRB- 10_1101-2021_01_02_425006 408 25 is be VBZ 10_1101-2021_01_02_425006 408 26 the the DT 10_1101-2021_01_02_425006 408 27 author author NN 10_1101-2021_01_02_425006 408 28 / / SYM 10_1101-2021_01_02_425006 408 29 funder funder NN 10_1101-2021_01_02_425006 408 30 , , , 10_1101-2021_01_02_425006 408 31 who who WP 10_1101-2021_01_02_425006 408 32 has have VBZ 10_1101-2021_01_02_425006 408 33 granted grant VBN 10_1101-2021_01_02_425006 408 34 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 408 35 a a DT 10_1101-2021_01_02_425006 408 36 license license NN 10_1101-2021_01_02_425006 408 37 to to TO 10_1101-2021_01_02_425006 408 38 display display VB 10_1101-2021_01_02_425006 408 39 the the DT 10_1101-2021_01_02_425006 408 40 preprint preprint NN 10_1101-2021_01_02_425006 408 41 in in IN 10_1101-2021_01_02_425006 408 42 perpetuity perpetuity NN 10_1101-2021_01_02_425006 408 43 . . . 10_1101-2021_01_02_425006 409 1 It -PRON- PRP 10_1101-2021_01_02_425006 409 2 is be VBZ 10_1101-2021_01_02_425006 409 3 made make VBN 10_1101-2021_01_02_425006 409 4 The the DT 10_1101-2021_01_02_425006 409 5 copyright copyright NN 10_1101-2021_01_02_425006 409 6 holder holder NN 10_1101-2021_01_02_425006 409 7 for for IN 10_1101-2021_01_02_425006 409 8 this this DT 10_1101-2021_01_02_425006 409 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 409 10 version version NN 10_1101-2021_01_02_425006 409 11 posted post VBD 10_1101-2021_01_02_425006 409 12 January January NNP 10_1101-2021_01_02_425006 409 13 6 6 CD 10_1101-2021_01_02_425006 409 14 , , , 10_1101-2021_01_02_425006 409 15 2021 2021 CD 10_1101-2021_01_02_425006 409 16 . . . 10_1101-2021_01_02_425006 409 17 ; ; : 10_1101-2021_01_02_425006 409 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 409 19 : : : 10_1101-2021_01_02_425006 409 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 409 21 preprint preprint NN 10_1101-2021_01_02_425006 409 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 409 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN 10_1101-2021_01_02_425006 409 24 30 30 CD 10_1101-2021_01_02_425006 409 25 e140-e140 e140-e140 NNS 10_1101-2021_01_02_425006 409 26 ( ( -LRB- 10_1101-2021_01_02_425006 409 27 2013 2013 CD 10_1101-2021_01_02_425006 409 28 ) ) -RRB- 10_1101-2021_01_02_425006 409 29 . . . 10_1101-2021_01_02_425006 410 1 515 515 CD 10_1101-2021_01_02_425006 410 2 28 28 CD 10_1101-2021_01_02_425006 410 3 . . . 10_1101-2021_01_02_425006 411 1 X. X. NNP 10_1101-2021_01_02_425006 411 2 Chen Chen NNP 10_1101-2021_01_02_425006 411 3 , , , 10_1101-2021_01_02_425006 411 4 W. W. NNP 10_1101-2021_01_02_425006 411 5 Chou Chou NNP 10_1101-2021_01_02_425006 411 6 , , , 10_1101-2021_01_02_425006 411 7 Q. Q. NNP 10_1101-2021_01_02_425006 411 8 Ma Ma NNP 10_1101-2021_01_02_425006 411 9 , , , 10_1101-2021_01_02_425006 411 10 Y. Y. NNP 10_1101-2021_01_02_425006 411 11 Xu Xu NNP 10_1101-2021_01_02_425006 411 12 , , , 10_1101-2021_01_02_425006 411 13 SeqTU SeqTU NNP 10_1101-2021_01_02_425006 411 14 : : : 10_1101-2021_01_02_425006 411 15 A a DT 10_1101-2021_01_02_425006 411 16 Web web NN 10_1101-2021_01_02_425006 411 17 Server server NN 10_1101-2021_01_02_425006 411 18 for for IN 10_1101-2021_01_02_425006 411 19 Identification Identification NNP 10_1101-2021_01_02_425006 411 20 of of IN 10_1101-2021_01_02_425006 411 21 Bacterial Bacterial NNP 10_1101-2021_01_02_425006 411 22 516 516 CD 10_1101-2021_01_02_425006 411 23 Transcription Transcription NNP 10_1101-2021_01_02_425006 411 24 Units Units NNPS 10_1101-2021_01_02_425006 411 25 . . . 10_1101-2021_01_02_425006 412 1 Scientific Scientific NNP 10_1101-2021_01_02_425006 412 2 Reports Reports NNPS 10_1101-2021_01_02_425006 412 3 7 7 CD 10_1101-2021_01_02_425006 412 4 , , , 10_1101-2021_01_02_425006 412 5 43925 43925 CD 10_1101-2021_01_02_425006 412 6 ( ( -LRB- 10_1101-2021_01_02_425006 412 7 2017 2017 CD 10_1101-2021_01_02_425006 412 8 ) ) -RRB- 10_1101-2021_01_02_425006 412 9 . . . 10_1101-2021_01_02_425006 413 1 517 517 CD 10_1101-2021_01_02_425006 413 2 29 29 CD 10_1101-2021_01_02_425006 413 3 . . . 10_1101-2021_01_02_425006 414 1 I. I. NNP 10_1101-2021_01_02_425006 415 1 A. A. NNP 10_1101-2021_01_02_425006 415 2 Garanina Garanina NNP 10_1101-2021_01_02_425006 415 3 , , , 10_1101-2021_01_02_425006 415 4 G. G. NNP 10_1101-2021_01_02_425006 415 5 Y. Y. NNP 10_1101-2021_01_02_425006 415 6 Fisunov Fisunov NNP 10_1101-2021_01_02_425006 415 7 , , , 10_1101-2021_01_02_425006 415 8 V. V. NNP 10_1101-2021_01_02_425006 415 9 M. M. NNP 10_1101-2021_01_02_425006 415 10 Govorun Govorun NNP 10_1101-2021_01_02_425006 415 11 , , , 10_1101-2021_01_02_425006 415 12 BAC BAC NNP 10_1101-2021_01_02_425006 415 13 - - HYPH 10_1101-2021_01_02_425006 415 14 BROWSER BROWSER NNP 10_1101-2021_01_02_425006 415 15 : : : 10_1101-2021_01_02_425006 415 16 The the DT 10_1101-2021_01_02_425006 415 17 Tool Tool NNP 10_1101-2021_01_02_425006 415 18 for for IN 10_1101-2021_01_02_425006 415 19 Visualization Visualization NNP 10_1101-2021_01_02_425006 415 20 and and CC 10_1101-2021_01_02_425006 415 21 518 518 CD 10_1101-2021_01_02_425006 415 22 Analysis Analysis NNP 10_1101-2021_01_02_425006 415 23 of of IN 10_1101-2021_01_02_425006 415 24 Prokaryotic Prokaryotic NNP 10_1101-2021_01_02_425006 415 25 Genomes Genomes NNPS 10_1101-2021_01_02_425006 415 26 . . . 10_1101-2021_01_02_425006 416 1 Frontiers frontier NNS 10_1101-2021_01_02_425006 416 2 in in IN 10_1101-2021_01_02_425006 416 3 Microbiology Microbiology NNP 10_1101-2021_01_02_425006 416 4 9 9 CD 10_1101-2021_01_02_425006 416 5 , , , 10_1101-2021_01_02_425006 416 6 2827 2827 CD 10_1101-2021_01_02_425006 416 7 ( ( -LRB- 10_1101-2021_01_02_425006 416 8 2018 2018 CD 10_1101-2021_01_02_425006 416 9 ) ) -RRB- 10_1101-2021_01_02_425006 416 10 . . . 10_1101-2021_01_02_425006 417 1 519 519 CD 10_1101-2021_01_02_425006 417 2 30 30 CD 10_1101-2021_01_02_425006 417 3 . . . 10_1101-2021_01_02_425006 418 1 B. B. NNP 10_1101-2021_01_02_425006 418 2 Taboada Taboada NNP 10_1101-2021_01_02_425006 418 3 , , , 10_1101-2021_01_02_425006 418 4 K. K. NNP 10_1101-2021_01_02_425006 418 5 Estrada Estrada NNP 10_1101-2021_01_02_425006 418 6 , , , 10_1101-2021_01_02_425006 418 7 R. R. NNP 10_1101-2021_01_02_425006 418 8 Ciria Ciria NNP 10_1101-2021_01_02_425006 418 9 , , , 10_1101-2021_01_02_425006 418 10 E. E. NNP 10_1101-2021_01_02_425006 418 11 Merino Merino NNP 10_1101-2021_01_02_425006 418 12 , , , 10_1101-2021_01_02_425006 418 13 Operon Operon NNP 10_1101-2021_01_02_425006 418 14 - - HYPH 10_1101-2021_01_02_425006 418 15 mapper mapper NNP 10_1101-2021_01_02_425006 418 16 : : : 10_1101-2021_01_02_425006 418 17 a a DT 10_1101-2021_01_02_425006 418 18 web web NN 10_1101-2021_01_02_425006 418 19 server server NN 10_1101-2021_01_02_425006 418 20 for for IN 10_1101-2021_01_02_425006 418 21 precise precise JJ 10_1101-2021_01_02_425006 418 22 operon operon NNP 10_1101-2021_01_02_425006 418 23 520 520 CD 10_1101-2021_01_02_425006 418 24 identification identification NN 10_1101-2021_01_02_425006 418 25 in in IN 10_1101-2021_01_02_425006 418 26 bacterial bacterial JJ 10_1101-2021_01_02_425006 418 27 and and CC 10_1101-2021_01_02_425006 418 28 archaeal archaeal JJ 10_1101-2021_01_02_425006 418 29 genomes genome NNS 10_1101-2021_01_02_425006 418 30 . . . 10_1101-2021_01_02_425006 419 1 Bioinformatics Bioinformatics NNP 10_1101-2021_01_02_425006 419 2 34 34 CD 10_1101-2021_01_02_425006 419 3 , , , 10_1101-2021_01_02_425006 419 4 4118 4118 CD 10_1101-2021_01_02_425006 419 5 - - SYM 10_1101-2021_01_02_425006 419 6 4120 4120 CD 10_1101-2021_01_02_425006 419 7 ( ( -LRB- 10_1101-2021_01_02_425006 419 8 2018 2018 CD 10_1101-2021_01_02_425006 419 9 ) ) -RRB- 10_1101-2021_01_02_425006 419 10 . . . 10_1101-2021_01_02_425006 420 1 521 521 CD 10_1101-2021_01_02_425006 420 2 31 31 CD 10_1101-2021_01_02_425006 420 3 . . . 10_1101-2021_01_02_425006 421 1 H. H. NNP 10_1101-2021_01_02_425006 421 2 Li Li NNP 10_1101-2021_01_02_425006 421 3 , , , 10_1101-2021_01_02_425006 421 4 R. R. NNP 10_1101-2021_01_02_425006 421 5 Durbin Durbin NNP 10_1101-2021_01_02_425006 421 6 , , , 10_1101-2021_01_02_425006 421 7 Fast Fast NNP 10_1101-2021_01_02_425006 421 8 and and CC 10_1101-2021_01_02_425006 421 9 accurate accurate JJ 10_1101-2021_01_02_425006 421 10 short short JJ 10_1101-2021_01_02_425006 421 11 read read NN 10_1101-2021_01_02_425006 421 12 alignment alignment NN 10_1101-2021_01_02_425006 421 13 with with IN 10_1101-2021_01_02_425006 421 14 Burrows Burrows NNP 10_1101-2021_01_02_425006 421 15 – – : 10_1101-2021_01_02_425006 421 16 Wheeler Wheeler NNP 10_1101-2021_01_02_425006 421 17 transform transform NN 10_1101-2021_01_02_425006 421 18 . . . 10_1101-2021_01_02_425006 422 1 522 522 CD 10_1101-2021_01_02_425006 422 2 Bioinformatics bioinformatic NNS 10_1101-2021_01_02_425006 422 3 25 25 CD 10_1101-2021_01_02_425006 422 4 , , , 10_1101-2021_01_02_425006 422 5 1754 1754 CD 10_1101-2021_01_02_425006 422 6 - - SYM 10_1101-2021_01_02_425006 422 7 1760 1760 CD 10_1101-2021_01_02_425006 422 8 ( ( -LRB- 10_1101-2021_01_02_425006 422 9 2009 2009 CD 10_1101-2021_01_02_425006 422 10 ) ) -RRB- 10_1101-2021_01_02_425006 422 11 . . . 10_1101-2021_01_02_425006 423 1 523 523 CD 10_1101-2021_01_02_425006 423 2 32 32 CD 10_1101-2021_01_02_425006 423 3 . . . 10_1101-2021_01_02_425006 424 1 Z. Z. NNP 10_1101-2021_01_02_425006 424 2 Wu Wu NNP 10_1101-2021_01_02_425006 424 3 , , , 10_1101-2021_01_02_425006 424 4 X. X. NNP 10_1101-2021_01_02_425006 424 5 Wang Wang NNP 10_1101-2021_01_02_425006 424 6 , , , 10_1101-2021_01_02_425006 424 7 X. X. NNP 10_1101-2021_01_02_425006 424 8 Zhang Zhang NNP 10_1101-2021_01_02_425006 424 9 , , , 10_1101-2021_01_02_425006 424 10 Using use VBG 10_1101-2021_01_02_425006 424 11 non non JJ 10_1101-2021_01_02_425006 424 12 - - JJ 10_1101-2021_01_02_425006 424 13 uniform uniform JJ 10_1101-2021_01_02_425006 424 14 read read VBD 10_1101-2021_01_02_425006 424 15 distribution distribution NN 10_1101-2021_01_02_425006 424 16 models model NNS 10_1101-2021_01_02_425006 424 17 to to TO 10_1101-2021_01_02_425006 424 18 improve improve VB 10_1101-2021_01_02_425006 424 19 isoform isoform NN 10_1101-2021_01_02_425006 424 20 524 524 CD 10_1101-2021_01_02_425006 424 21 expression expression NN 10_1101-2021_01_02_425006 424 22 inference inference NN 10_1101-2021_01_02_425006 424 23 in in IN 10_1101-2021_01_02_425006 424 24 RNA RNA NNP 10_1101-2021_01_02_425006 424 25 - - HYPH 10_1101-2021_01_02_425006 424 26 Seq Seq NNP 10_1101-2021_01_02_425006 424 27 . . . 10_1101-2021_01_02_425006 425 1 Bioinformatics Bioinformatics NNP 10_1101-2021_01_02_425006 425 2 27 27 CD 10_1101-2021_01_02_425006 425 3 , , , 10_1101-2021_01_02_425006 425 4 502 502 CD 10_1101-2021_01_02_425006 425 5 - - HYPH 10_1101-2021_01_02_425006 425 6 508 508 CD 10_1101-2021_01_02_425006 425 7 ( ( -LRB- 10_1101-2021_01_02_425006 425 8 2011 2011 CD 10_1101-2021_01_02_425006 425 9 ) ) -RRB- 10_1101-2021_01_02_425006 425 10 . . . 10_1101-2021_01_02_425006 426 1 525 525 CD 10_1101-2021_01_02_425006 426 2 33 33 CD 10_1101-2021_01_02_425006 426 3 . . . 10_1101-2021_01_02_425006 427 1 A. A. NNP 10_1101-2021_01_02_425006 427 2 Roberts Roberts NNP 10_1101-2021_01_02_425006 427 3 , , , 10_1101-2021_01_02_425006 427 4 C. C. NNP 10_1101-2021_01_02_425006 427 5 Trapnell Trapnell NNP 10_1101-2021_01_02_425006 427 6 , , , 10_1101-2021_01_02_425006 427 7 J. J. NNP 10_1101-2021_01_02_425006 427 8 Donaghey Donaghey NNP 10_1101-2021_01_02_425006 427 9 , , , 10_1101-2021_01_02_425006 427 10 J. J. NNP 10_1101-2021_01_02_425006 427 11 L. L. NNP 10_1101-2021_01_02_425006 427 12 Rinn Rinn NNP 10_1101-2021_01_02_425006 427 13 , , , 10_1101-2021_01_02_425006 427 14 L. L. NNP 10_1101-2021_01_02_425006 427 15 Pachter Pachter NNP 10_1101-2021_01_02_425006 427 16 , , , 10_1101-2021_01_02_425006 427 17 Improving Improving NNP 10_1101-2021_01_02_425006 427 18 RNA RNA NNP 10_1101-2021_01_02_425006 427 19 - - HYPH 10_1101-2021_01_02_425006 427 20 Seq Seq NNP 10_1101-2021_01_02_425006 427 21 expression expression NN 10_1101-2021_01_02_425006 427 22 526 526 CD 10_1101-2021_01_02_425006 427 23 estimates estimate NNS 10_1101-2021_01_02_425006 427 24 by by IN 10_1101-2021_01_02_425006 427 25 correcting correct VBG 10_1101-2021_01_02_425006 427 26 for for IN 10_1101-2021_01_02_425006 427 27 fragment fragment NN 10_1101-2021_01_02_425006 427 28 bias bias NN 10_1101-2021_01_02_425006 427 29 . . . 10_1101-2021_01_02_425006 428 1 Genome Genome NNP 10_1101-2021_01_02_425006 428 2 Biol Biol NNP 10_1101-2021_01_02_425006 428 3 . . . 10_1101-2021_01_02_425006 429 1 12 12 CD 10_1101-2021_01_02_425006 429 2 , , , 10_1101-2021_01_02_425006 429 3 1 1 CD 10_1101-2021_01_02_425006 429 4 - - SYM 10_1101-2021_01_02_425006 429 5 14 14 CD 10_1101-2021_01_02_425006 429 6 ( ( -LRB- 10_1101-2021_01_02_425006 429 7 2011 2011 CD 10_1101-2021_01_02_425006 429 8 ) ) -RRB- 10_1101-2021_01_02_425006 429 9 . . . 10_1101-2021_01_02_425006 430 1 527 527 CD 10_1101-2021_01_02_425006 430 2 34 34 CD 10_1101-2021_01_02_425006 430 3 . . . 10_1101-2021_01_02_425006 431 1 R. R. NNP 10_1101-2021_01_02_425006 431 2 Bohnert Bohnert NNP 10_1101-2021_01_02_425006 431 3 , , , 10_1101-2021_01_02_425006 431 4 G. G. NNP 10_1101-2021_01_02_425006 431 5 Rï Rï NNP 10_1101-2021_01_02_425006 431 6 ¿ ¿ NNP 10_1101-2021_01_02_425006 431 7 ½tsch ½tsch NNS 10_1101-2021_01_02_425006 431 8 , , , 10_1101-2021_01_02_425006 431 9 rQuant rQuant NNP 10_1101-2021_01_02_425006 431 10 . . . 10_1101-2021_01_02_425006 432 1 web web NN 10_1101-2021_01_02_425006 432 2 : : : 10_1101-2021_01_02_425006 432 3 a a DT 10_1101-2021_01_02_425006 432 4 tool tool NN 10_1101-2021_01_02_425006 432 5 for for IN 10_1101-2021_01_02_425006 432 6 RNA RNA NNP 10_1101-2021_01_02_425006 432 7 - - HYPH 10_1101-2021_01_02_425006 432 8 Seq Seq NNP 10_1101-2021_01_02_425006 432 9 - - HYPH 10_1101-2021_01_02_425006 432 10 based base VBN 10_1101-2021_01_02_425006 432 11 transcript transcript NN 10_1101-2021_01_02_425006 432 12 quantitation quantitation NN 10_1101-2021_01_02_425006 432 13 . . . 10_1101-2021_01_02_425006 433 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 433 2 528 528 CD 10_1101-2021_01_02_425006 433 3 Acids Acids NNPS 10_1101-2021_01_02_425006 433 4 Res Res NNP 10_1101-2021_01_02_425006 433 5 . . . 10_1101-2021_01_02_425006 434 1 38 38 CD 10_1101-2021_01_02_425006 434 2 , , , 10_1101-2021_01_02_425006 434 3 W348-W351 W348-W351 NNP 10_1101-2021_01_02_425006 434 4 ( ( -LRB- 10_1101-2021_01_02_425006 434 5 2010 2010 CD 10_1101-2021_01_02_425006 434 6 ) ) -RRB- 10_1101-2021_01_02_425006 434 7 . . . 10_1101-2021_01_02_425006 435 1 529 529 CD 10_1101-2021_01_02_425006 435 2 35 35 CD 10_1101-2021_01_02_425006 435 3 . . . 10_1101-2021_01_02_425006 436 1 W. W. NNP 10_1101-2021_01_02_425006 436 2 Li Li NNP 10_1101-2021_01_02_425006 436 3 , , , 10_1101-2021_01_02_425006 436 4 T. T. NNP 10_1101-2021_01_02_425006 436 5 Jiang Jiang NNP 10_1101-2021_01_02_425006 436 6 , , , 10_1101-2021_01_02_425006 436 7 Transcriptome Transcriptome NNP 10_1101-2021_01_02_425006 436 8 assembly assembly NN 10_1101-2021_01_02_425006 436 9 and and CC 10_1101-2021_01_02_425006 436 10 isoform isoform NN 10_1101-2021_01_02_425006 436 11 expression expression NN 10_1101-2021_01_02_425006 436 12 level level NN 10_1101-2021_01_02_425006 436 13 estimation estimation NN 10_1101-2021_01_02_425006 436 14 from from IN 10_1101-2021_01_02_425006 436 15 biased biased JJ 10_1101-2021_01_02_425006 436 16 530 530 CD 10_1101-2021_01_02_425006 436 17 RNA RNA NNP 10_1101-2021_01_02_425006 436 18 - - HYPH 10_1101-2021_01_02_425006 436 19 Seq Seq NNP 10_1101-2021_01_02_425006 436 20 reads read VBZ 10_1101-2021_01_02_425006 436 21 . . . 10_1101-2021_01_02_425006 437 1 Bioinformatics Bioinformatics NNP 10_1101-2021_01_02_425006 437 2 28 28 CD 10_1101-2021_01_02_425006 437 3 , , , 10_1101-2021_01_02_425006 437 4 2914 2914 CD 10_1101-2021_01_02_425006 437 5 - - SYM 10_1101-2021_01_02_425006 437 6 2921 2921 CD 10_1101-2021_01_02_425006 437 7 ( ( -LRB- 10_1101-2021_01_02_425006 437 8 2012 2012 CD 10_1101-2021_01_02_425006 437 9 ) ) -RRB- 10_1101-2021_01_02_425006 437 10 . . . 10_1101-2021_01_02_425006 438 1 531 531 CD 10_1101-2021_01_02_425006 438 2 .CC .CC : 10_1101-2021_01_02_425006 438 3 - - HYPH 10_1101-2021_01_02_425006 438 4 BY by IN 10_1101-2021_01_02_425006 438 5 - - HYPH 10_1101-2021_01_02_425006 438 6 NC NC NNP 10_1101-2021_01_02_425006 438 7 - - HYPH 10_1101-2021_01_02_425006 438 8 ND ND NNP 10_1101-2021_01_02_425006 438 9 4.0 4.0 CD 10_1101-2021_01_02_425006 438 10 International International NNP 10_1101-2021_01_02_425006 438 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 438 12 under under IN 10_1101-2021_01_02_425006 438 13 a a DT 10_1101-2021_01_02_425006 438 14 ( ( -LRB- 10_1101-2021_01_02_425006 438 15 which which WDT 10_1101-2021_01_02_425006 438 16 was be VBD 10_1101-2021_01_02_425006 438 17 not not RB 10_1101-2021_01_02_425006 438 18 certified certify VBN 10_1101-2021_01_02_425006 438 19 by by IN 10_1101-2021_01_02_425006 438 20 peer peer NN 10_1101-2021_01_02_425006 438 21 review review NN 10_1101-2021_01_02_425006 438 22 ) ) -RRB- 10_1101-2021_01_02_425006 438 23 is be VBZ 10_1101-2021_01_02_425006 438 24 the the DT 10_1101-2021_01_02_425006 438 25 author author NN 10_1101-2021_01_02_425006 438 26 / / SYM 10_1101-2021_01_02_425006 438 27 funder funder NN 10_1101-2021_01_02_425006 438 28 , , , 10_1101-2021_01_02_425006 438 29 who who WP 10_1101-2021_01_02_425006 438 30 has have VBZ 10_1101-2021_01_02_425006 438 31 granted grant VBN 10_1101-2021_01_02_425006 438 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 438 33 a a DT 10_1101-2021_01_02_425006 438 34 license license NN 10_1101-2021_01_02_425006 438 35 to to TO 10_1101-2021_01_02_425006 438 36 display display VB 10_1101-2021_01_02_425006 438 37 the the DT 10_1101-2021_01_02_425006 438 38 preprint preprint NN 10_1101-2021_01_02_425006 438 39 in in IN 10_1101-2021_01_02_425006 438 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 438 41 . . . 10_1101-2021_01_02_425006 439 1 It -PRON- PRP 10_1101-2021_01_02_425006 439 2 is be VBZ 10_1101-2021_01_02_425006 439 3 made make VBN 10_1101-2021_01_02_425006 439 4 The the DT 10_1101-2021_01_02_425006 439 5 copyright copyright NN 10_1101-2021_01_02_425006 439 6 holder holder NN 10_1101-2021_01_02_425006 439 7 for for IN 10_1101-2021_01_02_425006 439 8 this this DT 10_1101-2021_01_02_425006 439 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 439 10 version version NN 10_1101-2021_01_02_425006 439 11 posted post VBD 10_1101-2021_01_02_425006 439 12 January January NNP 10_1101-2021_01_02_425006 439 13 6 6 CD 10_1101-2021_01_02_425006 439 14 , , , 10_1101-2021_01_02_425006 439 15 2021 2021 CD 10_1101-2021_01_02_425006 439 16 . . . 10_1101-2021_01_02_425006 439 17 ; ; : 10_1101-2021_01_02_425006 439 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 439 19 : : : 10_1101-2021_01_02_425006 439 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 439 21 preprint preprint NN 10_1101-2021_01_02_425006 439 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 439 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 439 24 31 31 CD 10_1101-2021_01_02_425006 439 25 36 36 CD 10_1101-2021_01_02_425006 439 26 . . . 10_1101-2021_01_02_425006 440 1 B. B. NNP 10_1101-2021_01_02_425006 440 2 Xiong Xiong NNP 10_1101-2021_01_02_425006 440 3 , , , 10_1101-2021_01_02_425006 440 4 Y. Y. NNP 10_1101-2021_01_02_425006 440 5 Yang Yang NNP 10_1101-2021_01_02_425006 440 6 , , , 10_1101-2021_01_02_425006 440 7 F. F. NNP 10_1101-2021_01_02_425006 440 8 R. R. NNP 10_1101-2021_01_02_425006 440 9 Fineis Fineis NNP 10_1101-2021_01_02_425006 440 10 , , , 10_1101-2021_01_02_425006 440 11 J.-P. J.-P. NNP 10_1101-2021_01_02_425006 440 12 Wang Wang NNP 10_1101-2021_01_02_425006 440 13 , , , 10_1101-2021_01_02_425006 440 14 DegNorm DegNorm NNP 10_1101-2021_01_02_425006 440 15 : : : 10_1101-2021_01_02_425006 440 16 normalization normalization NN 10_1101-2021_01_02_425006 440 17 of of IN 10_1101-2021_01_02_425006 440 18 generalized generalize VBN 10_1101-2021_01_02_425006 440 19 transcript transcript NNP 10_1101-2021_01_02_425006 440 20 532 532 CD 10_1101-2021_01_02_425006 440 21 degradation degradation NN 10_1101-2021_01_02_425006 440 22 improves improve VBZ 10_1101-2021_01_02_425006 440 23 accuracy accuracy NN 10_1101-2021_01_02_425006 440 24 in in IN 10_1101-2021_01_02_425006 440 25 RNA RNA NNP 10_1101-2021_01_02_425006 440 26 - - HYPH 10_1101-2021_01_02_425006 440 27 seq seq NN 10_1101-2021_01_02_425006 440 28 analysis analysis NN 10_1101-2021_01_02_425006 440 29 . . . 10_1101-2021_01_02_425006 441 1 Genome Genome NNP 10_1101-2021_01_02_425006 441 2 Biol Biol NNP 10_1101-2021_01_02_425006 441 3 . . . 10_1101-2021_01_02_425006 442 1 20 20 CD 10_1101-2021_01_02_425006 442 2 , , , 10_1101-2021_01_02_425006 442 3 75 75 CD 10_1101-2021_01_02_425006 442 4 ( ( -LRB- 10_1101-2021_01_02_425006 442 5 2019 2019 CD 10_1101-2021_01_02_425006 442 6 ) ) -RRB- 10_1101-2021_01_02_425006 442 7 . . . 10_1101-2021_01_02_425006 443 1 533 533 CD 10_1101-2021_01_02_425006 443 2 37 37 CD 10_1101-2021_01_02_425006 443 3 . . . 10_1101-2021_01_02_425006 444 1 J. J. NNP 10_1101-2021_01_02_425006 444 2 Chaitanya Chaitanya NNP 10_1101-2021_01_02_425006 444 3 , , , 10_1101-2021_01_02_425006 444 4 Degradation Degradation NNP 10_1101-2021_01_02_425006 444 5 of of IN 10_1101-2021_01_02_425006 444 6 mRNA mRNA NNS 10_1101-2021_01_02_425006 444 7 in in IN 10_1101-2021_01_02_425006 444 8 Escherichia Escherichia NNP 10_1101-2021_01_02_425006 444 9 coli coli NNS 10_1101-2021_01_02_425006 444 10 . . . 10_1101-2021_01_02_425006 445 1 IUBMB IUBMB NNP 10_1101-2021_01_02_425006 445 2 Life Life NNP 10_1101-2021_01_02_425006 445 3 54 54 CD 10_1101-2021_01_02_425006 445 4 , , , 10_1101-2021_01_02_425006 445 5 315 315 CD 10_1101-2021_01_02_425006 445 6 - - SYM 10_1101-2021_01_02_425006 445 7 321 321 CD 10_1101-2021_01_02_425006 445 8 ( ( -LRB- 10_1101-2021_01_02_425006 445 9 2010 2010 CD 10_1101-2021_01_02_425006 445 10 ) ) -RRB- 10_1101-2021_01_02_425006 445 11 . . . 10_1101-2021_01_02_425006 446 1 534 534 CD 10_1101-2021_01_02_425006 446 2 38 38 CD 10_1101-2021_01_02_425006 446 3 . . . 10_1101-2021_01_02_425006 447 1 X. X. NNP 10_1101-2021_01_02_425006 447 2 Mao Mao NNP 10_1101-2021_01_02_425006 447 3 , , , 10_1101-2021_01_02_425006 447 4 Q. Q. NNP 10_1101-2021_01_02_425006 447 5 Ma Ma NNP 10_1101-2021_01_02_425006 447 6 , , , 10_1101-2021_01_02_425006 447 7 B. B. NNP 10_1101-2021_01_02_425006 447 8 Liu Liu NNP 10_1101-2021_01_02_425006 447 9 , , , 10_1101-2021_01_02_425006 447 10 X. X. NNP 10_1101-2021_01_02_425006 447 11 Chen Chen NNP 10_1101-2021_01_02_425006 447 12 , , , 10_1101-2021_01_02_425006 447 13 H. H. NNP 10_1101-2021_01_02_425006 447 14 Zhang Zhang NNP 10_1101-2021_01_02_425006 447 15 , , , 10_1101-2021_01_02_425006 447 16 Y. Y. NNP 10_1101-2021_01_02_425006 447 17 Xu Xu NNP 10_1101-2021_01_02_425006 447 18 , , , 10_1101-2021_01_02_425006 447 19 Revisiting Revisiting NNP 10_1101-2021_01_02_425006 447 20 operons operon NNS 10_1101-2021_01_02_425006 447 21 : : : 10_1101-2021_01_02_425006 447 22 an an DT 10_1101-2021_01_02_425006 447 23 analysis analysis NN 10_1101-2021_01_02_425006 447 24 of of IN 10_1101-2021_01_02_425006 447 25 the the DT 10_1101-2021_01_02_425006 447 26 landscape landscape NN 10_1101-2021_01_02_425006 447 27 535 535 CD 10_1101-2021_01_02_425006 447 28 of of IN 10_1101-2021_01_02_425006 447 29 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 447 30 units unit NNS 10_1101-2021_01_02_425006 447 31 in in IN 10_1101-2021_01_02_425006 447 32 E. E. NNP 10_1101-2021_01_02_425006 447 33 coli coli NNS 10_1101-2021_01_02_425006 447 34 . . . 10_1101-2021_01_02_425006 448 1 BMC BMC NNP 10_1101-2021_01_02_425006 448 2 Bioinformatics Bioinformatics NNP 10_1101-2021_01_02_425006 448 3 16 16 CD 10_1101-2021_01_02_425006 448 4 , , , 10_1101-2021_01_02_425006 448 5 356 356 CD 10_1101-2021_01_02_425006 448 6 ( ( -LRB- 10_1101-2021_01_02_425006 448 7 2015 2015 CD 10_1101-2021_01_02_425006 448 8 ) ) -RRB- 10_1101-2021_01_02_425006 448 9 . . . 10_1101-2021_01_02_425006 449 1 536 536 CD 10_1101-2021_01_02_425006 449 2 39 39 CD 10_1101-2021_01_02_425006 449 3 . . . 10_1101-2021_01_02_425006 450 1 B. B. NNP 10_1101-2021_01_02_425006 450 2 Marie Marie NNP 10_1101-2021_01_02_425006 450 3 , , , 10_1101-2021_01_02_425006 450 4 K. K. NNP 10_1101-2021_01_02_425006 450 5 H. H. NNP 10_1101-2021_01_02_425006 450 6 Thilo Thilo NNP 10_1101-2021_01_02_425006 450 7 , , , 10_1101-2021_01_02_425006 450 8 F. F. NNP 10_1101-2021_01_02_425006 450 9 Thierry Thierry NNP 10_1101-2021_01_02_425006 450 10 , , , 10_1101-2021_01_02_425006 450 11 T. T. NNP 10_1101-2021_01_02_425006 450 12 Mikael Mikael NNP 10_1101-2021_01_02_425006 450 13 , , , 10_1101-2021_01_02_425006 450 14 R. R. NNP 10_1101-2021_01_02_425006 450 15 Adriana Adriana NNP 10_1101-2021_01_02_425006 450 16 , , , 10_1101-2021_01_02_425006 450 17 V. V. NNP 10_1101-2021_01_02_425006 450 18 D. D. NNP 10_1101-2021_01_02_425006 450 19 Christian Christian NNP 10_1101-2021_01_02_425006 450 20 , , , 10_1101-2021_01_02_425006 450 21 Metabolic metabolic JJ 10_1101-2021_01_02_425006 450 22 pathways pathway NNS 10_1101-2021_01_02_425006 450 23 of of IN 10_1101-2021_01_02_425006 450 24 537 537 CD 10_1101-2021_01_02_425006 450 25 Pseudomonas Pseudomonas NNP 10_1101-2021_01_02_425006 450 26 aeruginosa aeruginosa RB 10_1101-2021_01_02_425006 450 27 involved involve VBN 10_1101-2021_01_02_425006 450 28 in in IN 10_1101-2021_01_02_425006 450 29 competition competition NN 10_1101-2021_01_02_425006 450 30 with with IN 10_1101-2021_01_02_425006 450 31 respiratory respiratory JJ 10_1101-2021_01_02_425006 450 32 bacterial bacterial JJ 10_1101-2021_01_02_425006 450 33 pathogens pathogen NNS 10_1101-2021_01_02_425006 450 34 . . . 10_1101-2021_01_02_425006 451 1 Frontiers frontier NNS 10_1101-2021_01_02_425006 451 2 538 538 CD 10_1101-2021_01_02_425006 451 3 in in IN 10_1101-2021_01_02_425006 451 4 Microbiology Microbiology NNP 10_1101-2021_01_02_425006 451 5 6 6 CD 10_1101-2021_01_02_425006 451 6 , , , 10_1101-2021_01_02_425006 451 7 321 321 CD 10_1101-2021_01_02_425006 451 8 ( ( -LRB- 10_1101-2021_01_02_425006 451 9 2015 2015 CD 10_1101-2021_01_02_425006 451 10 ) ) -RRB- 10_1101-2021_01_02_425006 451 11 . . . 10_1101-2021_01_02_425006 452 1 539 539 CD 10_1101-2021_01_02_425006 452 2 40 40 CD 10_1101-2021_01_02_425006 452 3 . . . 10_1101-2021_01_02_425006 453 1 C. C. NNP 10_1101-2021_01_02_425006 453 2 Nadiras Nadiras NNP 10_1101-2021_01_02_425006 453 3 , , , 10_1101-2021_01_02_425006 453 4 E. E. NNP 10_1101-2021_01_02_425006 453 5 Eveno Eveno NNP 10_1101-2021_01_02_425006 453 6 , , , 10_1101-2021_01_02_425006 453 7 A. A. NNP 10_1101-2021_01_02_425006 453 8 Schwartz Schwartz NNP 10_1101-2021_01_02_425006 453 9 , , , 10_1101-2021_01_02_425006 453 10 N. N. NNP 10_1101-2021_01_02_425006 453 11 Figueroa Figueroa NNP 10_1101-2021_01_02_425006 453 12 - - HYPH 10_1101-2021_01_02_425006 453 13 Bossi Bossi NNP 10_1101-2021_01_02_425006 453 14 , , , 10_1101-2021_01_02_425006 453 15 M. M. NNP 10_1101-2021_01_02_425006 453 16 Boudvillain Boudvillain NNP 10_1101-2021_01_02_425006 453 17 , , , 10_1101-2021_01_02_425006 453 18 A a DT 10_1101-2021_01_02_425006 453 19 multivariate multivariate JJ 10_1101-2021_01_02_425006 453 20 prediction prediction NN 10_1101-2021_01_02_425006 453 21 540 540 CD 10_1101-2021_01_02_425006 453 22 model model NN 10_1101-2021_01_02_425006 453 23 for for IN 10_1101-2021_01_02_425006 453 24 Rho Rho NNP 10_1101-2021_01_02_425006 453 25 - - HYPH 10_1101-2021_01_02_425006 453 26 dependent dependent JJ 10_1101-2021_01_02_425006 453 27 termination termination NN 10_1101-2021_01_02_425006 453 28 of of IN 10_1101-2021_01_02_425006 453 29 transcription transcription NN 10_1101-2021_01_02_425006 453 30 . . . 10_1101-2021_01_02_425006 454 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 454 2 Acids Acids NNPS 10_1101-2021_01_02_425006 454 3 Res Res NNP 10_1101-2021_01_02_425006 454 4 . . . 10_1101-2021_01_02_425006 455 1 46 46 CD 10_1101-2021_01_02_425006 455 2 , , , 10_1101-2021_01_02_425006 455 3 8245 8245 CD 10_1101-2021_01_02_425006 455 4 - - SYM 10_1101-2021_01_02_425006 455 5 8260 8260 CD 10_1101-2021_01_02_425006 455 6 ( ( -LRB- 10_1101-2021_01_02_425006 455 7 2018 2018 CD 10_1101-2021_01_02_425006 455 8 ) ) -RRB- 10_1101-2021_01_02_425006 455 9 . . . 10_1101-2021_01_02_425006 456 1 541 541 CD 10_1101-2021_01_02_425006 456 2 41 41 CD 10_1101-2021_01_02_425006 456 3 . . . 10_1101-2021_01_02_425006 457 1 C. C. NNP 10_1101-2021_01_02_425006 457 2 L. L. NNP 10_1101-2021_01_02_425006 457 3 Kingsford Kingsford NNP 10_1101-2021_01_02_425006 457 4 , , , 10_1101-2021_01_02_425006 457 5 K. K. NNP 10_1101-2021_01_02_425006 457 6 Ayanbule Ayanbule NNP 10_1101-2021_01_02_425006 457 7 , , , 10_1101-2021_01_02_425006 457 8 S. S. NNP 10_1101-2021_01_02_425006 457 9 L. L. NNP 10_1101-2021_01_02_425006 457 10 Salzberg Salzberg NNP 10_1101-2021_01_02_425006 457 11 , , , 10_1101-2021_01_02_425006 457 12 Rapid Rapid NNP 10_1101-2021_01_02_425006 457 13 , , , 10_1101-2021_01_02_425006 457 14 accurate accurate JJ 10_1101-2021_01_02_425006 457 15 , , , 10_1101-2021_01_02_425006 457 16 computational computational JJ 10_1101-2021_01_02_425006 457 17 discovery discovery NN 10_1101-2021_01_02_425006 457 18 of of IN 10_1101-2021_01_02_425006 457 19 Rho-542 rho-542 CD 10_1101-2021_01_02_425006 457 20 independent independent JJ 10_1101-2021_01_02_425006 457 21 transcription transcription NN 10_1101-2021_01_02_425006 457 22 terminators terminator NNS 10_1101-2021_01_02_425006 457 23 illuminates illuminate VBZ 10_1101-2021_01_02_425006 457 24 their -PRON- PRP$ 10_1101-2021_01_02_425006 457 25 relationship relationship NN 10_1101-2021_01_02_425006 457 26 to to IN 10_1101-2021_01_02_425006 457 27 DNA DNA NNP 10_1101-2021_01_02_425006 457 28 uptake uptake NN 10_1101-2021_01_02_425006 457 29 . . . 10_1101-2021_01_02_425006 458 1 Genome Genome NNP 10_1101-2021_01_02_425006 458 2 Biol Biol NNP 10_1101-2021_01_02_425006 458 3 . . . 10_1101-2021_01_02_425006 459 1 543 543 CD 10_1101-2021_01_02_425006 459 2 8 8 CD 10_1101-2021_01_02_425006 459 3 , , , 10_1101-2021_01_02_425006 459 4 R22 R22 NNP 10_1101-2021_01_02_425006 459 5 ( ( -LRB- 10_1101-2021_01_02_425006 459 6 2007 2007 CD 10_1101-2021_01_02_425006 459 7 ) ) -RRB- 10_1101-2021_01_02_425006 459 8 . . . 10_1101-2021_01_02_425006 460 1 544 544 CD 10_1101-2021_01_02_425006 460 2 42 42 CD 10_1101-2021_01_02_425006 460 3 . . . 10_1101-2021_01_02_425006 461 1 M. M. NNP 10_1101-2021_01_02_425006 461 2 Ashburner Ashburner NNP 10_1101-2021_01_02_425006 461 3 , , , 10_1101-2021_01_02_425006 461 4 S. S. NNP 10_1101-2021_01_02_425006 461 5 Lewis Lewis NNP 10_1101-2021_01_02_425006 461 6 , , , 10_1101-2021_01_02_425006 461 7 On on IN 10_1101-2021_01_02_425006 461 8 Ontologies ontology NNS 10_1101-2021_01_02_425006 461 9 for for IN 10_1101-2021_01_02_425006 461 10 Biologists Biologists NNPS 10_1101-2021_01_02_425006 461 11 : : : 10_1101-2021_01_02_425006 461 12 The the DT 10_1101-2021_01_02_425006 461 13 Gene Gene NNP 10_1101-2021_01_02_425006 461 14 Ontology Ontology NNP 10_1101-2021_01_02_425006 461 15 — — : 10_1101-2021_01_02_425006 461 16 Untangling untangle VBG 10_1101-2021_01_02_425006 461 17 the the DT 10_1101-2021_01_02_425006 461 18 Web web NN 10_1101-2021_01_02_425006 461 19 . . . 10_1101-2021_01_02_425006 462 1 545 545 CD 10_1101-2021_01_02_425006 462 2 Novartis Novartis NNP 10_1101-2021_01_02_425006 462 3 Found Found NNP 10_1101-2021_01_02_425006 462 4 . . . 10_1101-2021_01_02_425006 463 1 Symp Symp VBN 10_1101-2021_01_02_425006 463 2 . . . 10_1101-2021_01_02_425006 464 1 247 247 CD 10_1101-2021_01_02_425006 464 2 , , , 10_1101-2021_01_02_425006 464 3 66 66 CD 10_1101-2021_01_02_425006 464 4 - - SYM 10_1101-2021_01_02_425006 464 5 80 80 CD 10_1101-2021_01_02_425006 464 6 ; ; : 10_1101-2021_01_02_425006 464 7 discussion discussion NN 10_1101-2021_01_02_425006 464 8 80 80 CD 10_1101-2021_01_02_425006 464 9 - - SYM 10_1101-2021_01_02_425006 464 10 63 63 CD 10_1101-2021_01_02_425006 464 11 , , , 10_1101-2021_01_02_425006 464 12 84 84 CD 10_1101-2021_01_02_425006 464 13 - - SYM 10_1101-2021_01_02_425006 464 14 90 90 CD 10_1101-2021_01_02_425006 464 15 , , , 10_1101-2021_01_02_425006 464 16 244 244 CD 10_1101-2021_01_02_425006 464 17 - - SYM 10_1101-2021_01_02_425006 464 18 252 252 CD 10_1101-2021_01_02_425006 464 19 ( ( -LRB- 10_1101-2021_01_02_425006 464 20 2002 2002 CD 10_1101-2021_01_02_425006 464 21 ) ) -RRB- 10_1101-2021_01_02_425006 464 22 . . . 10_1101-2021_01_02_425006 465 1 546 546 CD 10_1101-2021_01_02_425006 465 2 43 43 CD 10_1101-2021_01_02_425006 465 3 . . . 10_1101-2021_01_02_425006 466 1 H. H. NNP 10_1101-2021_01_02_425006 466 2 Wu Wu NNP 10_1101-2021_01_02_425006 466 3 , , , 10_1101-2021_01_02_425006 466 4 Z. Z. NNP 10_1101-2021_01_02_425006 466 5 Su Su NNP 10_1101-2021_01_02_425006 466 6 , , , 10_1101-2021_01_02_425006 466 7 F. F. NNP 10_1101-2021_01_02_425006 466 8 Mao Mao NNP 10_1101-2021_01_02_425006 466 9 , , , 10_1101-2021_01_02_425006 466 10 V. V. NNP 10_1101-2021_01_02_425006 466 11 Olman Olman NNP 10_1101-2021_01_02_425006 466 12 , , , 10_1101-2021_01_02_425006 466 13 Y. Y. NNP 10_1101-2021_01_02_425006 466 14 Xu Xu NNP 10_1101-2021_01_02_425006 466 15 , , , 10_1101-2021_01_02_425006 466 16 Prediction Prediction NNP 10_1101-2021_01_02_425006 466 17 of of IN 10_1101-2021_01_02_425006 466 18 functional functional JJ 10_1101-2021_01_02_425006 466 19 modules module NNS 10_1101-2021_01_02_425006 466 20 based base VBN 10_1101-2021_01_02_425006 466 21 on on IN 10_1101-2021_01_02_425006 466 22 comparative comparative JJ 10_1101-2021_01_02_425006 466 23 547 547 CD 10_1101-2021_01_02_425006 466 24 genome genome JJ 10_1101-2021_01_02_425006 466 25 analysis analysis NN 10_1101-2021_01_02_425006 466 26 and and CC 10_1101-2021_01_02_425006 466 27 Gene Gene NNP 10_1101-2021_01_02_425006 466 28 Ontology Ontology NNP 10_1101-2021_01_02_425006 466 29 application application NN 10_1101-2021_01_02_425006 466 30 . . . 10_1101-2021_01_02_425006 467 1 Nucleic Nucleic NNP 10_1101-2021_01_02_425006 467 2 Acids Acids NNPS 10_1101-2021_01_02_425006 467 3 Res Res NNP 10_1101-2021_01_02_425006 467 4 . . . 10_1101-2021_01_02_425006 468 1 33 33 CD 10_1101-2021_01_02_425006 468 2 , , , 10_1101-2021_01_02_425006 468 3 2822 2822 CD 10_1101-2021_01_02_425006 468 4 - - SYM 10_1101-2021_01_02_425006 468 5 2837 2837 CD 10_1101-2021_01_02_425006 468 6 ( ( -LRB- 10_1101-2021_01_02_425006 468 7 2005 2005 CD 10_1101-2021_01_02_425006 468 8 ) ) -RRB- 10_1101-2021_01_02_425006 468 9 . . . 10_1101-2021_01_02_425006 469 1 548 548 CD 10_1101-2021_01_02_425006 469 2 .CC .CC : 10_1101-2021_01_02_425006 469 3 - - HYPH 10_1101-2021_01_02_425006 469 4 BY by IN 10_1101-2021_01_02_425006 469 5 - - HYPH 10_1101-2021_01_02_425006 469 6 NC NC NNP 10_1101-2021_01_02_425006 469 7 - - HYPH 10_1101-2021_01_02_425006 469 8 ND ND NNP 10_1101-2021_01_02_425006 469 9 4.0 4.0 CD 10_1101-2021_01_02_425006 469 10 International International NNP 10_1101-2021_01_02_425006 469 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 469 12 under under IN 10_1101-2021_01_02_425006 469 13 a a DT 10_1101-2021_01_02_425006 469 14 ( ( -LRB- 10_1101-2021_01_02_425006 469 15 which which WDT 10_1101-2021_01_02_425006 469 16 was be VBD 10_1101-2021_01_02_425006 469 17 not not RB 10_1101-2021_01_02_425006 469 18 certified certify VBN 10_1101-2021_01_02_425006 469 19 by by IN 10_1101-2021_01_02_425006 469 20 peer peer NN 10_1101-2021_01_02_425006 469 21 review review NN 10_1101-2021_01_02_425006 469 22 ) ) -RRB- 10_1101-2021_01_02_425006 469 23 is be VBZ 10_1101-2021_01_02_425006 469 24 the the DT 10_1101-2021_01_02_425006 469 25 author author NN 10_1101-2021_01_02_425006 469 26 / / SYM 10_1101-2021_01_02_425006 469 27 funder funder NN 10_1101-2021_01_02_425006 469 28 , , , 10_1101-2021_01_02_425006 469 29 who who WP 10_1101-2021_01_02_425006 469 30 has have VBZ 10_1101-2021_01_02_425006 469 31 granted grant VBN 10_1101-2021_01_02_425006 469 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 469 33 a a DT 10_1101-2021_01_02_425006 469 34 license license NN 10_1101-2021_01_02_425006 469 35 to to TO 10_1101-2021_01_02_425006 469 36 display display VB 10_1101-2021_01_02_425006 469 37 the the DT 10_1101-2021_01_02_425006 469 38 preprint preprint NN 10_1101-2021_01_02_425006 469 39 in in IN 10_1101-2021_01_02_425006 469 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 469 41 . . . 10_1101-2021_01_02_425006 470 1 It -PRON- PRP 10_1101-2021_01_02_425006 470 2 is be VBZ 10_1101-2021_01_02_425006 470 3 made make VBN 10_1101-2021_01_02_425006 470 4 The the DT 10_1101-2021_01_02_425006 470 5 copyright copyright NN 10_1101-2021_01_02_425006 470 6 holder holder NN 10_1101-2021_01_02_425006 470 7 for for IN 10_1101-2021_01_02_425006 470 8 this this DT 10_1101-2021_01_02_425006 470 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 470 10 version version NN 10_1101-2021_01_02_425006 470 11 posted post VBD 10_1101-2021_01_02_425006 470 12 January January NNP 10_1101-2021_01_02_425006 470 13 6 6 CD 10_1101-2021_01_02_425006 470 14 , , , 10_1101-2021_01_02_425006 470 15 2021 2021 CD 10_1101-2021_01_02_425006 470 16 . . . 10_1101-2021_01_02_425006 470 17 ; ; : 10_1101-2021_01_02_425006 470 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 470 19 : : : 10_1101-2021_01_02_425006 470 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 470 21 preprint preprint NN 10_1101-2021_01_02_425006 470 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 470 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 470 24 32 32 CD 10_1101-2021_01_02_425006 470 25 44 44 CD 10_1101-2021_01_02_425006 470 26 . . . 10_1101-2021_01_02_425006 471 1 S. S. NNP 10_1101-2021_01_02_425006 471 2 A. A. NNP 10_1101-2021_01_02_425006 471 3 Teukolsky Teukolsky NNP 10_1101-2021_01_02_425006 471 4 , , , 10_1101-2021_01_02_425006 471 5 B. B. NNP 10_1101-2021_01_02_425006 471 6 P. P. NNP 10_1101-2021_01_02_425006 471 7 Flannery Flannery NNP 10_1101-2021_01_02_425006 471 8 , , , 10_1101-2021_01_02_425006 471 9 W. W. NNP 10_1101-2021_01_02_425006 471 10 Press Press NNP 10_1101-2021_01_02_425006 471 11 , , , 10_1101-2021_01_02_425006 471 12 W. W. NNP 10_1101-2021_01_02_425006 471 13 Vetterling Vetterling NNP 10_1101-2021_01_02_425006 471 14 , , , 10_1101-2021_01_02_425006 471 15 Numerical Numerical NNP 10_1101-2021_01_02_425006 471 16 Recipes Recipes NNPS 10_1101-2021_01_02_425006 471 17 in in IN 10_1101-2021_01_02_425006 471 18 C C NNP 10_1101-2021_01_02_425006 471 19 : : : 10_1101-2021_01_02_425006 471 20 The the DT 10_1101-2021_01_02_425006 471 21 Art art NN 10_1101-2021_01_02_425006 471 22 of of IN 10_1101-2021_01_02_425006 471 23 549 549 CD 10_1101-2021_01_02_425006 471 24 Scientific Scientific NNP 10_1101-2021_01_02_425006 471 25 Computing computing NN 10_1101-2021_01_02_425006 471 26 . . . 10_1101-2021_01_02_425006 472 1 Cambridge Cambridge NNP 10_1101-2021_01_02_425006 472 2 University University NNP 10_1101-2021_01_02_425006 472 3 Press Press NNP 10_1101-2021_01_02_425006 472 4 , , , 10_1101-2021_01_02_425006 472 5 Cambridge Cambridge NNP 10_1101-2021_01_02_425006 472 6 ( ( -LRB- 10_1101-2021_01_02_425006 472 7 1992 1992 CD 10_1101-2021_01_02_425006 472 8 ) ) -RRB- 10_1101-2021_01_02_425006 472 9 . . . 10_1101-2021_01_02_425006 473 1 550 550 CD 10_1101-2021_01_02_425006 473 2 45 45 CD 10_1101-2021_01_02_425006 473 3 . . . 10_1101-2021_01_02_425006 474 1 L. L. NNP 10_1101-2021_01_02_425006 474 2 Wan Wan NNP 10_1101-2021_01_02_425006 474 3 , , , 10_1101-2021_01_02_425006 474 4 X. X. NNP 10_1101-2021_01_02_425006 474 5 Yan Yan NNP 10_1101-2021_01_02_425006 474 6 , , , 10_1101-2021_01_02_425006 474 7 T. T. NNP 10_1101-2021_01_02_425006 474 8 Chen Chen NNP 10_1101-2021_01_02_425006 474 9 , , , 10_1101-2021_01_02_425006 474 10 F. F. NNP 10_1101-2021_01_02_425006 474 11 Sun Sun NNP 10_1101-2021_01_02_425006 474 12 , , , 10_1101-2021_01_02_425006 474 13 Modeling Modeling NNP 10_1101-2021_01_02_425006 474 14 RNA RNA NNP 10_1101-2021_01_02_425006 474 15 degradation degradation NN 10_1101-2021_01_02_425006 474 16 for for IN 10_1101-2021_01_02_425006 474 17 RNA RNA NNP 10_1101-2021_01_02_425006 474 18 - - HYPH 10_1101-2021_01_02_425006 474 19 Seq Seq NNP 10_1101-2021_01_02_425006 474 20 with with IN 10_1101-2021_01_02_425006 474 21 applications application NNS 10_1101-2021_01_02_425006 474 22 . . . 10_1101-2021_01_02_425006 475 1 551 551 CD 10_1101-2021_01_02_425006 475 2 Biostatistics Biostatistics NNP 10_1101-2021_01_02_425006 475 3 13 13 CD 10_1101-2021_01_02_425006 475 4 , , , 10_1101-2021_01_02_425006 475 5 734 734 CD 10_1101-2021_01_02_425006 475 6 - - HYPH 10_1101-2021_01_02_425006 475 7 747 747 CD 10_1101-2021_01_02_425006 475 8 ( ( -LRB- 10_1101-2021_01_02_425006 475 9 2012 2012 CD 10_1101-2021_01_02_425006 475 10 ) ) -RRB- 10_1101-2021_01_02_425006 475 11 . . . 10_1101-2021_01_02_425006 476 1 552 552 CD 10_1101-2021_01_02_425006 476 2 46 46 CD 10_1101-2021_01_02_425006 476 3 . . . 10_1101-2021_01_02_425006 477 1 C. C. NNP 10_1101-2021_01_02_425006 477 2 Yanofsky Yanofsky NNP 10_1101-2021_01_02_425006 477 3 , , , 10_1101-2021_01_02_425006 477 4 Attenuation Attenuation NNP 10_1101-2021_01_02_425006 477 5 in in IN 10_1101-2021_01_02_425006 477 6 the the DT 10_1101-2021_01_02_425006 477 7 control control NN 10_1101-2021_01_02_425006 477 8 of of IN 10_1101-2021_01_02_425006 477 9 expression expression NN 10_1101-2021_01_02_425006 477 10 of of IN 10_1101-2021_01_02_425006 477 11 bacterial bacterial JJ 10_1101-2021_01_02_425006 477 12 operons operon NNS 10_1101-2021_01_02_425006 477 13 . . . 10_1101-2021_01_02_425006 478 1 Nature nature NN 10_1101-2021_01_02_425006 478 2 289 289 CD 10_1101-2021_01_02_425006 478 3 , , , 10_1101-2021_01_02_425006 478 4 751 751 CD 10_1101-2021_01_02_425006 478 5 ( ( -LRB- 10_1101-2021_01_02_425006 478 6 1981 1981 CD 10_1101-2021_01_02_425006 478 7 ) ) -RRB- 10_1101-2021_01_02_425006 478 8 . . . 10_1101-2021_01_02_425006 479 1 553 553 CD 10_1101-2021_01_02_425006 479 2 47 47 CD 10_1101-2021_01_02_425006 479 3 . . . 10_1101-2021_01_02_425006 480 1 B. B. NNP 10_1101-2021_01_02_425006 480 2 K. K. NNP 10_1101-2021_01_02_425006 480 3 Cho Cho NNP 10_1101-2021_01_02_425006 480 4 , , , 10_1101-2021_01_02_425006 480 5 D. D. NNP 10_1101-2021_01_02_425006 480 6 Kim Kim NNP 10_1101-2021_01_02_425006 480 7 , , , 10_1101-2021_01_02_425006 480 8 E. E. NNP 10_1101-2021_01_02_425006 480 9 M. M. NNP 10_1101-2021_01_02_425006 480 10 Knight Knight NNP 10_1101-2021_01_02_425006 480 11 , , , 10_1101-2021_01_02_425006 480 12 K. K. NNP 10_1101-2021_01_02_425006 480 13 Zengler Zengler NNP 10_1101-2021_01_02_425006 480 14 , , , 10_1101-2021_01_02_425006 480 15 B. B. NNP 10_1101-2021_01_02_425006 480 16 O. O. NNP 10_1101-2021_01_02_425006 480 17 Palsson Palsson NNP 10_1101-2021_01_02_425006 480 18 , , , 10_1101-2021_01_02_425006 480 19 Genome Genome NNP 10_1101-2021_01_02_425006 480 20 - - HYPH 10_1101-2021_01_02_425006 480 21 scale scale NN 10_1101-2021_01_02_425006 480 22 reconstruction reconstruction NN 10_1101-2021_01_02_425006 480 23 of of IN 10_1101-2021_01_02_425006 480 24 the the DT 10_1101-2021_01_02_425006 480 25 554 554 CD 10_1101-2021_01_02_425006 480 26 sigma sigma JJ 10_1101-2021_01_02_425006 480 27 factor factor NN 10_1101-2021_01_02_425006 480 28 network network NN 10_1101-2021_01_02_425006 480 29 in in IN 10_1101-2021_01_02_425006 480 30 Escherichia Escherichia NNP 10_1101-2021_01_02_425006 480 31 coli coli NNS 10_1101-2021_01_02_425006 480 32 : : : 10_1101-2021_01_02_425006 480 33 topology topology NN 10_1101-2021_01_02_425006 480 34 and and CC 10_1101-2021_01_02_425006 480 35 functional functional JJ 10_1101-2021_01_02_425006 480 36 states state NNS 10_1101-2021_01_02_425006 480 37 . . . 10_1101-2021_01_02_425006 481 1 BMC BMC NNP 10_1101-2021_01_02_425006 481 2 Biol Biol NNP 10_1101-2021_01_02_425006 481 3 . . . 10_1101-2021_01_02_425006 482 1 12 12 CD 10_1101-2021_01_02_425006 482 2 , , , 10_1101-2021_01_02_425006 482 3 4 4 CD 10_1101-2021_01_02_425006 482 4 - - SYM 10_1101-2021_01_02_425006 482 5 4 4 CD 10_1101-2021_01_02_425006 482 6 ( ( -LRB- 10_1101-2021_01_02_425006 482 7 2014 2014 CD 10_1101-2021_01_02_425006 482 8 ) ) -RRB- 10_1101-2021_01_02_425006 482 9 . . . 10_1101-2021_01_02_425006 483 1 555 555 CD 10_1101-2021_01_02_425006 483 2 48 48 CD 10_1101-2021_01_02_425006 483 3 . . . 10_1101-2021_01_02_425006 484 1 B.-K. B.-K. NNP 10_1101-2021_01_02_425006 484 2 Cho Cho NNP 10_1101-2021_01_02_425006 484 3 , , , 10_1101-2021_01_02_425006 484 4 P. P. NNP 10_1101-2021_01_02_425006 484 5 Charusanti Charusanti NNP 10_1101-2021_01_02_425006 484 6 , , , 10_1101-2021_01_02_425006 484 7 M. M. NNP 10_1101-2021_01_02_425006 484 8 J. J. NNP 10_1101-2021_01_02_425006 484 9 Herrgård Herrgård NNP 10_1101-2021_01_02_425006 484 10 , , , 10_1101-2021_01_02_425006 484 11 Microbial Microbial NNP 10_1101-2021_01_02_425006 484 12 regulatory regulatory JJ 10_1101-2021_01_02_425006 484 13 and and CC 10_1101-2021_01_02_425006 484 14 metabolic metabolic JJ 10_1101-2021_01_02_425006 484 15 networks network NNS 10_1101-2021_01_02_425006 484 16 . . . 10_1101-2021_01_02_425006 485 1 Curr curr UH 10_1101-2021_01_02_425006 485 2 . . . 10_1101-2021_01_02_425006 486 1 Opin opin JJ 10_1101-2021_01_02_425006 486 2 . . . 10_1101-2021_01_02_425006 487 1 556 556 CD 10_1101-2021_01_02_425006 487 2 Biotechnol Biotechnol NNPS 10_1101-2021_01_02_425006 487 3 . . . 10_1101-2021_01_02_425006 488 1 18 18 CD 10_1101-2021_01_02_425006 488 2 , , , 10_1101-2021_01_02_425006 488 3 360 360 CD 10_1101-2021_01_02_425006 488 4 - - SYM 10_1101-2021_01_02_425006 488 5 364 364 CD 10_1101-2021_01_02_425006 488 6 ( ( -LRB- 10_1101-2021_01_02_425006 488 7 2007 2007 CD 10_1101-2021_01_02_425006 488 8 ) ) -RRB- 10_1101-2021_01_02_425006 488 9 . . . 10_1101-2021_01_02_425006 489 1 557 557 CD 10_1101-2021_01_02_425006 489 2 49 49 CD 10_1101-2021_01_02_425006 489 3 . . . 10_1101-2021_01_02_425006 490 1 A. A. NNP 10_1101-2021_01_02_425006 490 2 Toledo Toledo NNP 10_1101-2021_01_02_425006 490 3 - - HYPH 10_1101-2021_01_02_425006 490 4 Arana Arana NNP 10_1101-2021_01_02_425006 490 5 , , , 10_1101-2021_01_02_425006 490 6 O. O. NNP 10_1101-2021_01_02_425006 490 7 Dussurget Dussurget NNP 10_1101-2021_01_02_425006 490 8 , , , 10_1101-2021_01_02_425006 490 9 G. G. NNP 10_1101-2021_01_02_425006 490 10 Nikitas Nikitas NNP 10_1101-2021_01_02_425006 490 11 , , , 10_1101-2021_01_02_425006 490 12 N. N. NNP 10_1101-2021_01_02_425006 490 13 Sesto Sesto NNP 10_1101-2021_01_02_425006 490 14 , , , 10_1101-2021_01_02_425006 490 15 H. H. NNP 10_1101-2021_01_02_425006 490 16 Guet Guet NNP 10_1101-2021_01_02_425006 490 17 - - HYPH 10_1101-2021_01_02_425006 490 18 Revillet Revillet NNP 10_1101-2021_01_02_425006 490 19 , , , 10_1101-2021_01_02_425006 490 20 D. D. NNP 10_1101-2021_01_02_425006 490 21 Balestrino Balestrino NNP 10_1101-2021_01_02_425006 490 22 , , , 10_1101-2021_01_02_425006 490 23 E. E. NNP 10_1101-2021_01_02_425006 490 24 Loh Loh NNP 10_1101-2021_01_02_425006 490 25 , , , 10_1101-2021_01_02_425006 490 26 J. J. NNP 10_1101-2021_01_02_425006 491 1 558 558 CD 10_1101-2021_01_02_425006 491 2 Gripenland Gripenland NNP 10_1101-2021_01_02_425006 491 3 , , , 10_1101-2021_01_02_425006 491 4 T. T. NNP 10_1101-2021_01_02_425006 491 5 Tiensuu Tiensuu NNP 10_1101-2021_01_02_425006 491 6 , , , 10_1101-2021_01_02_425006 491 7 K. K. NNP 10_1101-2021_01_02_425006 491 8 Vaitkevicius Vaitkevicius NNP 10_1101-2021_01_02_425006 491 9 , , , 10_1101-2021_01_02_425006 491 10 The the DT 10_1101-2021_01_02_425006 491 11 Listeria Listeria NNP 10_1101-2021_01_02_425006 491 12 transcriptional transcriptional JJ 10_1101-2021_01_02_425006 491 13 landscape landscape NN 10_1101-2021_01_02_425006 491 14 from from IN 10_1101-2021_01_02_425006 491 15 saprophytism saprophytism NNP 10_1101-2021_01_02_425006 491 16 559 559 CD 10_1101-2021_01_02_425006 491 17 to to IN 10_1101-2021_01_02_425006 491 18 virulence virulence NN 10_1101-2021_01_02_425006 491 19 . . . 10_1101-2021_01_02_425006 492 1 Nature nature NN 10_1101-2021_01_02_425006 492 2 459 459 CD 10_1101-2021_01_02_425006 492 3 , , , 10_1101-2021_01_02_425006 492 4 950 950 CD 10_1101-2021_01_02_425006 492 5 - - SYM 10_1101-2021_01_02_425006 492 6 956 956 CD 10_1101-2021_01_02_425006 492 7 ( ( -LRB- 10_1101-2021_01_02_425006 492 8 2009 2009 CD 10_1101-2021_01_02_425006 492 9 ) ) -RRB- 10_1101-2021_01_02_425006 492 10 . . . 10_1101-2021_01_02_425006 493 1 560 560 CD 10_1101-2021_01_02_425006 493 2 50 50 CD 10_1101-2021_01_02_425006 493 3 . . . 10_1101-2021_01_02_425006 494 1 B. B. NNP 10_1101-2021_01_02_425006 494 2 Yue Yue NNP 10_1101-2021_01_02_425006 494 3 , , , 10_1101-2021_01_02_425006 494 4 X. X. NNP 10_1101-2021_01_02_425006 494 5 Luo Luo NNP 10_1101-2021_01_02_425006 494 6 , , , 10_1101-2021_01_02_425006 494 7 Z. Z. NNP 10_1101-2021_01_02_425006 494 8 Yu Yu NNP 10_1101-2021_01_02_425006 494 9 , , , 10_1101-2021_01_02_425006 494 10 S. S. NNP 10_1101-2021_01_02_425006 494 11 Mani Mani NNP 10_1101-2021_01_02_425006 494 12 , , , 10_1101-2021_01_02_425006 494 13 Z. Z. NNP 10_1101-2021_01_02_425006 494 14 Wang Wang NNP 10_1101-2021_01_02_425006 494 15 , , , 10_1101-2021_01_02_425006 494 16 W. W. NNP 10_1101-2021_01_02_425006 494 17 Dou Dou NNP 10_1101-2021_01_02_425006 494 18 , , , 10_1101-2021_01_02_425006 494 19 Inflammatory Inflammatory NNP 10_1101-2021_01_02_425006 494 20 bowel bowel NN 10_1101-2021_01_02_425006 494 21 disease disease NN 10_1101-2021_01_02_425006 494 22 : : : 10_1101-2021_01_02_425006 494 23 a a DT 10_1101-2021_01_02_425006 494 24 potential potential JJ 10_1101-2021_01_02_425006 494 25 result result NN 10_1101-2021_01_02_425006 494 26 561 561 CD 10_1101-2021_01_02_425006 494 27 from from IN 10_1101-2021_01_02_425006 494 28 the the DT 10_1101-2021_01_02_425006 494 29 collusion collusion NN 10_1101-2021_01_02_425006 494 30 between between IN 10_1101-2021_01_02_425006 494 31 gut gut NNP 10_1101-2021_01_02_425006 494 32 microbiota microbiota NNP 10_1101-2021_01_02_425006 494 33 and and CC 10_1101-2021_01_02_425006 494 34 mucosal mucosal NN 10_1101-2021_01_02_425006 494 35 immune immune JJ 10_1101-2021_01_02_425006 494 36 system system NN 10_1101-2021_01_02_425006 494 37 . . . 10_1101-2021_01_02_425006 495 1 Microorganisms microorganism NNS 10_1101-2021_01_02_425006 495 2 7 7 CD 10_1101-2021_01_02_425006 495 3 , , , 10_1101-2021_01_02_425006 495 4 440 440 CD 10_1101-2021_01_02_425006 495 5 562 562 CD 10_1101-2021_01_02_425006 495 6 ( ( -LRB- 10_1101-2021_01_02_425006 495 7 2019 2019 CD 10_1101-2021_01_02_425006 495 8 ) ) -RRB- 10_1101-2021_01_02_425006 495 9 . . . 10_1101-2021_01_02_425006 496 1 563 563 CD 10_1101-2021_01_02_425006 496 2 51 51 CD 10_1101-2021_01_02_425006 496 3 . . . 10_1101-2021_01_02_425006 497 1 B. B. NNP 10_1101-2021_01_02_425006 497 2 H. H. NNP 10_1101-2021_01_02_425006 497 3 Mullish Mullish NNP 10_1101-2021_01_02_425006 497 4 , , , 10_1101-2021_01_02_425006 497 5 H. H. NNP 10_1101-2021_01_02_425006 497 6 R. R. NNP 10_1101-2021_01_02_425006 497 7 Williams Williams NNP 10_1101-2021_01_02_425006 497 8 , , , 10_1101-2021_01_02_425006 497 9 Clostridium Clostridium NNP 10_1101-2021_01_02_425006 497 10 difficile difficile NN 10_1101-2021_01_02_425006 497 11 infection infection NN 10_1101-2021_01_02_425006 497 12 and and CC 10_1101-2021_01_02_425006 497 13 antibiotic antibiotic NN 10_1101-2021_01_02_425006 497 14 - - HYPH 10_1101-2021_01_02_425006 497 15 associated associate VBN 10_1101-2021_01_02_425006 497 16 diarrhoea diarrhoea NNP 10_1101-2021_01_02_425006 497 17 . . . 10_1101-2021_01_02_425006 498 1 564 564 CD 10_1101-2021_01_02_425006 498 2 Clin Clin NNP 10_1101-2021_01_02_425006 498 3 . . . 10_1101-2021_01_02_425006 499 1 Med Med NNP 10_1101-2021_01_02_425006 499 2 . . . 10_1101-2021_01_02_425006 500 1 18 18 CD 10_1101-2021_01_02_425006 500 2 , , , 10_1101-2021_01_02_425006 500 3 237 237 CD 10_1101-2021_01_02_425006 500 4 ( ( -LRB- 10_1101-2021_01_02_425006 500 5 2018 2018 CD 10_1101-2021_01_02_425006 500 6 ) ) -RRB- 10_1101-2021_01_02_425006 500 7 . . . 10_1101-2021_01_02_425006 501 1 565 565 CD 10_1101-2021_01_02_425006 501 2 .CC .CC : 10_1101-2021_01_02_425006 501 3 - - HYPH 10_1101-2021_01_02_425006 501 4 BY by IN 10_1101-2021_01_02_425006 501 5 - - HYPH 10_1101-2021_01_02_425006 501 6 NC NC NNP 10_1101-2021_01_02_425006 501 7 - - HYPH 10_1101-2021_01_02_425006 501 8 ND ND NNP 10_1101-2021_01_02_425006 501 9 4.0 4.0 CD 10_1101-2021_01_02_425006 501 10 International International NNP 10_1101-2021_01_02_425006 501 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 501 12 under under IN 10_1101-2021_01_02_425006 501 13 a a DT 10_1101-2021_01_02_425006 501 14 ( ( -LRB- 10_1101-2021_01_02_425006 501 15 which which WDT 10_1101-2021_01_02_425006 501 16 was be VBD 10_1101-2021_01_02_425006 501 17 not not RB 10_1101-2021_01_02_425006 501 18 certified certify VBN 10_1101-2021_01_02_425006 501 19 by by IN 10_1101-2021_01_02_425006 501 20 peer peer NN 10_1101-2021_01_02_425006 501 21 review review NN 10_1101-2021_01_02_425006 501 22 ) ) -RRB- 10_1101-2021_01_02_425006 501 23 is be VBZ 10_1101-2021_01_02_425006 501 24 the the DT 10_1101-2021_01_02_425006 501 25 author author NN 10_1101-2021_01_02_425006 501 26 / / SYM 10_1101-2021_01_02_425006 501 27 funder funder NN 10_1101-2021_01_02_425006 501 28 , , , 10_1101-2021_01_02_425006 501 29 who who WP 10_1101-2021_01_02_425006 501 30 has have VBZ 10_1101-2021_01_02_425006 501 31 granted grant VBN 10_1101-2021_01_02_425006 501 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 501 33 a a DT 10_1101-2021_01_02_425006 501 34 license license NN 10_1101-2021_01_02_425006 501 35 to to TO 10_1101-2021_01_02_425006 501 36 display display VB 10_1101-2021_01_02_425006 501 37 the the DT 10_1101-2021_01_02_425006 501 38 preprint preprint NN 10_1101-2021_01_02_425006 501 39 in in IN 10_1101-2021_01_02_425006 501 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 501 41 . . . 10_1101-2021_01_02_425006 502 1 It -PRON- PRP 10_1101-2021_01_02_425006 502 2 is be VBZ 10_1101-2021_01_02_425006 502 3 made make VBN 10_1101-2021_01_02_425006 502 4 The the DT 10_1101-2021_01_02_425006 502 5 copyright copyright NN 10_1101-2021_01_02_425006 502 6 holder holder NN 10_1101-2021_01_02_425006 502 7 for for IN 10_1101-2021_01_02_425006 502 8 this this DT 10_1101-2021_01_02_425006 502 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 502 10 version version NN 10_1101-2021_01_02_425006 502 11 posted post VBD 10_1101-2021_01_02_425006 502 12 January January NNP 10_1101-2021_01_02_425006 502 13 6 6 CD 10_1101-2021_01_02_425006 502 14 , , , 10_1101-2021_01_02_425006 502 15 2021 2021 CD 10_1101-2021_01_02_425006 502 16 . . . 10_1101-2021_01_02_425006 502 17 ; ; : 10_1101-2021_01_02_425006 502 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 502 19 : : : 10_1101-2021_01_02_425006 502 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 502 21 preprint preprint NN 10_1101-2021_01_02_425006 502 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 502 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 502 24 33 33 CD 10_1101-2021_01_02_425006 502 25 52 52 CD 10_1101-2021_01_02_425006 502 26 . . . 10_1101-2021_01_02_425006 503 1 M. M. NNP 10_1101-2021_01_02_425006 503 2 Maguire Maguire NNP 10_1101-2021_01_02_425006 503 3 , , , 10_1101-2021_01_02_425006 503 4 G. G. NNP 10_1101-2021_01_02_425006 503 5 Maguire Maguire NNP 10_1101-2021_01_02_425006 503 6 , , , 10_1101-2021_01_02_425006 503 7 Gut Gut NNP 10_1101-2021_01_02_425006 503 8 dysbiosis dysbiosis NN 10_1101-2021_01_02_425006 503 9 , , , 10_1101-2021_01_02_425006 503 10 leaky leaky JJ 10_1101-2021_01_02_425006 503 11 gut gut NN 10_1101-2021_01_02_425006 503 12 , , , 10_1101-2021_01_02_425006 503 13 and and CC 10_1101-2021_01_02_425006 503 14 intestinal intestinal JJ 10_1101-2021_01_02_425006 503 15 epithelial epithelial JJ 10_1101-2021_01_02_425006 503 16 proliferation proliferation NN 10_1101-2021_01_02_425006 503 17 in in IN 10_1101-2021_01_02_425006 503 18 566 566 CD 10_1101-2021_01_02_425006 503 19 neurological neurological JJ 10_1101-2021_01_02_425006 503 20 disorders disorder NNS 10_1101-2021_01_02_425006 503 21 : : : 10_1101-2021_01_02_425006 503 22 towards towards IN 10_1101-2021_01_02_425006 503 23 the the DT 10_1101-2021_01_02_425006 503 24 development development NN 10_1101-2021_01_02_425006 503 25 of of IN 10_1101-2021_01_02_425006 503 26 a a DT 10_1101-2021_01_02_425006 503 27 new new JJ 10_1101-2021_01_02_425006 503 28 therapeutic therapeutic NN 10_1101-2021_01_02_425006 503 29 using use VBG 10_1101-2021_01_02_425006 503 30 amino amino JJ 10_1101-2021_01_02_425006 503 31 acids acid NNS 10_1101-2021_01_02_425006 503 32 , , , 10_1101-2021_01_02_425006 503 33 567 567 CD 10_1101-2021_01_02_425006 503 34 prebiotics prebiotic NNS 10_1101-2021_01_02_425006 503 35 , , , 10_1101-2021_01_02_425006 503 36 probiotics probiotic NNS 10_1101-2021_01_02_425006 503 37 , , , 10_1101-2021_01_02_425006 503 38 and and CC 10_1101-2021_01_02_425006 503 39 postbiotics postbiotic NNS 10_1101-2021_01_02_425006 503 40 . . . 10_1101-2021_01_02_425006 504 1 Rev. Rev. NNP 10_1101-2021_01_02_425006 505 1 Neurosci Neurosci NNP 10_1101-2021_01_02_425006 505 2 . . . 10_1101-2021_01_02_425006 506 1 30 30 CD 10_1101-2021_01_02_425006 506 2 , , , 10_1101-2021_01_02_425006 506 3 179 179 CD 10_1101-2021_01_02_425006 506 4 - - SYM 10_1101-2021_01_02_425006 506 5 201 201 CD 10_1101-2021_01_02_425006 506 6 ( ( -LRB- 10_1101-2021_01_02_425006 506 7 2019 2019 CD 10_1101-2021_01_02_425006 506 8 ) ) -RRB- 10_1101-2021_01_02_425006 506 9 . . . 10_1101-2021_01_02_425006 507 1 568 568 CD 10_1101-2021_01_02_425006 507 2 53 53 CD 10_1101-2021_01_02_425006 507 3 . . . 10_1101-2021_01_02_425006 508 1 S. S. NNP 10_1101-2021_01_02_425006 508 2 Vivarelli Vivarelli NNP 10_1101-2021_01_02_425006 508 3 , , , 10_1101-2021_01_02_425006 508 4 R. R. NNP 10_1101-2021_01_02_425006 508 5 Salemi Salemi NNP 10_1101-2021_01_02_425006 508 6 , , , 10_1101-2021_01_02_425006 508 7 S. S. NNP 10_1101-2021_01_02_425006 508 8 Candido Candido NNP 10_1101-2021_01_02_425006 508 9 , , , 10_1101-2021_01_02_425006 508 10 L. L. NNP 10_1101-2021_01_02_425006 508 11 Falzone Falzone NNP 10_1101-2021_01_02_425006 508 12 , , , 10_1101-2021_01_02_425006 508 13 M. M. NNP 10_1101-2021_01_02_425006 508 14 Santagati Santagati NNP 10_1101-2021_01_02_425006 508 15 , , , 10_1101-2021_01_02_425006 508 16 S. S. NNP 10_1101-2021_01_02_425006 508 17 Stefani Stefani NNP 10_1101-2021_01_02_425006 508 18 , , , 10_1101-2021_01_02_425006 508 19 F. F. NNP 10_1101-2021_01_02_425006 508 20 Torino Torino NNP 10_1101-2021_01_02_425006 508 21 , , , 10_1101-2021_01_02_425006 508 22 G. G. NNP 10_1101-2021_01_02_425006 508 23 L. L. NNP 10_1101-2021_01_02_425006 508 24 Banna Banna NNP 10_1101-2021_01_02_425006 508 25 , , , 10_1101-2021_01_02_425006 508 26 569 569 CD 10_1101-2021_01_02_425006 508 27 G. G. NNP 10_1101-2021_01_02_425006 508 28 Tonini Tonini NNP 10_1101-2021_01_02_425006 508 29 , , , 10_1101-2021_01_02_425006 508 30 M. M. NNP 10_1101-2021_01_02_425006 508 31 Libra Libra NNP 10_1101-2021_01_02_425006 508 32 , , , 10_1101-2021_01_02_425006 508 33 Gut Gut NNP 10_1101-2021_01_02_425006 508 34 microbiota microbiota NNP 10_1101-2021_01_02_425006 508 35 and and CC 10_1101-2021_01_02_425006 508 36 cancer cancer NN 10_1101-2021_01_02_425006 508 37 : : : 10_1101-2021_01_02_425006 508 38 from from IN 10_1101-2021_01_02_425006 508 39 pathogenesis pathogenesis NN 10_1101-2021_01_02_425006 508 40 to to IN 10_1101-2021_01_02_425006 508 41 therapy therapy NN 10_1101-2021_01_02_425006 508 42 . . . 10_1101-2021_01_02_425006 509 1 Cancers cancer NNS 10_1101-2021_01_02_425006 509 2 11 11 CD 10_1101-2021_01_02_425006 509 3 , , , 10_1101-2021_01_02_425006 509 4 38 38 CD 10_1101-2021_01_02_425006 509 5 570 570 CD 10_1101-2021_01_02_425006 509 6 ( ( -LRB- 10_1101-2021_01_02_425006 509 7 2019 2019 CD 10_1101-2021_01_02_425006 509 8 ) ) -RRB- 10_1101-2021_01_02_425006 509 9 . . . 10_1101-2021_01_02_425006 510 1 571 571 CD 10_1101-2021_01_02_425006 510 2 54 54 CD 10_1101-2021_01_02_425006 510 3 . . . 10_1101-2021_01_02_425006 511 1 G. G. NNP 10_1101-2021_01_02_425006 511 2 Cammarota Cammarota NNP 10_1101-2021_01_02_425006 511 3 , , , 10_1101-2021_01_02_425006 511 4 G. G. NNP 10_1101-2021_01_02_425006 511 5 Ianiro Ianiro NNP 10_1101-2021_01_02_425006 511 6 , , , 10_1101-2021_01_02_425006 511 7 A. A. NNP 10_1101-2021_01_02_425006 511 8 Ahern Ahern NNP 10_1101-2021_01_02_425006 511 9 , , , 10_1101-2021_01_02_425006 511 10 C. C. NNP 10_1101-2021_01_02_425006 511 11 Carbone Carbone NNP 10_1101-2021_01_02_425006 511 12 , , , 10_1101-2021_01_02_425006 511 13 A. A. NNP 10_1101-2021_01_02_425006 511 14 Temko Temko NNP 10_1101-2021_01_02_425006 511 15 , , , 10_1101-2021_01_02_425006 511 16 M. M. NNP 10_1101-2021_01_02_425006 511 17 J. J. NNP 10_1101-2021_01_02_425006 511 18 Claesson Claesson NNP 10_1101-2021_01_02_425006 511 19 , , , 10_1101-2021_01_02_425006 511 20 A. A. NNP 10_1101-2021_01_02_425006 511 21 Gasbarrini Gasbarrini NNP 10_1101-2021_01_02_425006 511 22 , , , 10_1101-2021_01_02_425006 511 23 G. G. NNP 10_1101-2021_01_02_425006 511 24 572 572 CD 10_1101-2021_01_02_425006 511 25 Tortora Tortora NNP 10_1101-2021_01_02_425006 511 26 , , , 10_1101-2021_01_02_425006 511 27 Gut Gut NNP 10_1101-2021_01_02_425006 511 28 microbiome microbiome NN 10_1101-2021_01_02_425006 511 29 , , , 10_1101-2021_01_02_425006 511 30 big big JJ 10_1101-2021_01_02_425006 511 31 data datum NNS 10_1101-2021_01_02_425006 511 32 and and CC 10_1101-2021_01_02_425006 511 33 machine machine NN 10_1101-2021_01_02_425006 511 34 learning learn VBG 10_1101-2021_01_02_425006 511 35 to to TO 10_1101-2021_01_02_425006 511 36 promote promote VB 10_1101-2021_01_02_425006 511 37 precision precision NN 10_1101-2021_01_02_425006 511 38 medicine medicine NN 10_1101-2021_01_02_425006 511 39 for for IN 10_1101-2021_01_02_425006 511 40 cancer cancer NN 10_1101-2021_01_02_425006 511 41 . . . 10_1101-2021_01_02_425006 512 1 573 573 CD 10_1101-2021_01_02_425006 512 2 Nature Nature NNP 10_1101-2021_01_02_425006 512 3 Reviews Reviews NNPS 10_1101-2021_01_02_425006 512 4 Gastroenterology Gastroenterology NNP 10_1101-2021_01_02_425006 512 5 & & CC 10_1101-2021_01_02_425006 512 6 Hepatology Hepatology NNP 10_1101-2021_01_02_425006 512 7 17 17 CD 10_1101-2021_01_02_425006 512 8 , , , 10_1101-2021_01_02_425006 512 9 635 635 CD 10_1101-2021_01_02_425006 512 10 - - SYM 10_1101-2021_01_02_425006 512 11 648 648 CD 10_1101-2021_01_02_425006 512 12 ( ( -LRB- 10_1101-2021_01_02_425006 512 13 2020 2020 CD 10_1101-2021_01_02_425006 512 14 ) ) -RRB- 10_1101-2021_01_02_425006 512 15 . . . 10_1101-2021_01_02_425006 513 1 574 574 CD 10_1101-2021_01_02_425006 513 2 55 55 CD 10_1101-2021_01_02_425006 513 3 . . . 10_1101-2021_01_02_425006 514 1 S. S. NNP 10_1101-2021_01_02_425006 514 2 S. S. NNP 10_1101-2021_01_02_425006 514 3 A. a. NN 10_1101-2021_01_02_425006 514 4 Zaidi Zaidi NNP 10_1101-2021_01_02_425006 514 5 , , , 10_1101-2021_01_02_425006 514 6 X. X. NNP 10_1101-2021_01_02_425006 514 7 Zhang Zhang NNP 10_1101-2021_01_02_425006 514 8 , , , 10_1101-2021_01_02_425006 514 9 Computational Computational NNP 10_1101-2021_01_02_425006 514 10 operon operon NN 10_1101-2021_01_02_425006 514 11 prediction prediction NN 10_1101-2021_01_02_425006 514 12 in in IN 10_1101-2021_01_02_425006 514 13 whole whole JJ 10_1101-2021_01_02_425006 514 14 - - HYPH 10_1101-2021_01_02_425006 514 15 genomes genome NNS 10_1101-2021_01_02_425006 514 16 and and CC 10_1101-2021_01_02_425006 514 17 metagenomes metagenome NNS 10_1101-2021_01_02_425006 514 18 . . . 10_1101-2021_01_02_425006 515 1 575 575 CD 10_1101-2021_01_02_425006 515 2 Briefings briefing NNS 10_1101-2021_01_02_425006 515 3 in in IN 10_1101-2021_01_02_425006 515 4 functional functional JJ 10_1101-2021_01_02_425006 515 5 genomics genomic NNS 10_1101-2021_01_02_425006 515 6 16 16 CD 10_1101-2021_01_02_425006 515 7 , , , 10_1101-2021_01_02_425006 515 8 181 181 CD 10_1101-2021_01_02_425006 515 9 - - SYM 10_1101-2021_01_02_425006 515 10 193 193 CD 10_1101-2021_01_02_425006 515 11 ( ( -LRB- 10_1101-2021_01_02_425006 515 12 2017 2017 CD 10_1101-2021_01_02_425006 515 13 ) ) -RRB- 10_1101-2021_01_02_425006 515 14 . . . 10_1101-2021_01_02_425006 516 1 576 576 CD 10_1101-2021_01_02_425006 516 2 ACKNOWLEDGEMENTS acknowledgement NNS 10_1101-2021_01_02_425006 516 3 577 577 CD 10_1101-2021_01_02_425006 516 4 Funding funding NN 10_1101-2021_01_02_425006 516 5 : : : 10_1101-2021_01_02_425006 516 6 This this DT 10_1101-2021_01_02_425006 516 7 work work NN 10_1101-2021_01_02_425006 516 8 was be VBD 10_1101-2021_01_02_425006 516 9 supported support VBN 10_1101-2021_01_02_425006 516 10 by by IN 10_1101-2021_01_02_425006 516 11 the the DT 10_1101-2021_01_02_425006 516 12 National National NNP 10_1101-2021_01_02_425006 516 13 Nature Nature NNP 10_1101-2021_01_02_425006 516 14 Science Science NNP 10_1101-2021_01_02_425006 516 15 Foundation Foundation NNP 10_1101-2021_01_02_425006 516 16 of of IN 10_1101-2021_01_02_425006 516 17 China China NNP 10_1101-2021_01_02_425006 516 18 ( ( -LRB- 10_1101-2021_01_02_425006 516 19 NSFC NSFC NNP 10_1101-2021_01_02_425006 516 20 ) ) -RRB- 10_1101-2021_01_02_425006 516 21 578 578 CD 10_1101-2021_01_02_425006 516 22 [ [ -LRB- 10_1101-2021_01_02_425006 516 23 61772313 61772313 CD 10_1101-2021_01_02_425006 516 24 to to IN 10_1101-2021_01_02_425006 516 25 B.L. B.L. NNP 10_1101-2021_01_02_425006 516 26 , , , 10_1101-2021_01_02_425006 516 27 11931008 11931008 CD 10_1101-2021_01_02_425006 516 28 to to IN 10_1101-2021_01_02_425006 516 29 B.L. B.L. NNP 10_1101-2021_01_02_425006 517 1 ] ] -RRB- 10_1101-2021_01_02_425006 517 2 ; ; : 10_1101-2021_01_02_425006 517 3 Interdisciplinary Interdisciplinary NNP 10_1101-2021_01_02_425006 517 4 Science Science NNP 10_1101-2021_01_02_425006 517 5 Innovation Innovation NNP 10_1101-2021_01_02_425006 517 6 Group Group NNP 10_1101-2021_01_02_425006 517 7 Project Project NNP 10_1101-2021_01_02_425006 517 8 of of IN 10_1101-2021_01_02_425006 517 9 Shandong Shandong NNP 10_1101-2021_01_02_425006 517 10 579 579 CD 10_1101-2021_01_02_425006 517 11 University University NNP 10_1101-2021_01_02_425006 517 12 ( ( -LRB- 10_1101-2021_01_02_425006 517 13 2019 2019 CD 10_1101-2021_01_02_425006 517 14 ) ) -RRB- 10_1101-2021_01_02_425006 517 15 ; ; : 10_1101-2021_01_02_425006 517 16 and and CC 10_1101-2021_01_02_425006 517 17 the the DT 10_1101-2021_01_02_425006 517 18 Innovation Innovation NNP 10_1101-2021_01_02_425006 517 19 Method Method NNP 10_1101-2021_01_02_425006 517 20 Fund Fund NNP 10_1101-2021_01_02_425006 517 21 of of IN 10_1101-2021_01_02_425006 517 22 China China NNP 10_1101-2021_01_02_425006 517 23 [ [ -LRB- 10_1101-2021_01_02_425006 517 24 2018IM020200 2018im020200 CD 10_1101-2021_01_02_425006 517 25 to to IN 10_1101-2021_01_02_425006 517 26 B.L. B.L. NNP 10_1101-2021_01_02_425006 517 27 ] ] -RRB- 10_1101-2021_01_02_425006 517 28 . . . 10_1101-2021_01_02_425006 518 1 The the DT 10_1101-2021_01_02_425006 518 2 authors author NNS 10_1101-2021_01_02_425006 518 3 580 580 CD 10_1101-2021_01_02_425006 518 4 would would MD 10_1101-2021_01_02_425006 518 5 like like VB 10_1101-2021_01_02_425006 518 6 to to TO 10_1101-2021_01_02_425006 518 7 thank thank VB 10_1101-2021_01_02_425006 518 8 Yang Yang NNP 10_1101-2021_01_02_425006 518 9 Li Li NNP 10_1101-2021_01_02_425006 518 10 for for IN 10_1101-2021_01_02_425006 518 11 his -PRON- PRP$ 10_1101-2021_01_02_425006 518 12 assistance assistance NN 10_1101-2021_01_02_425006 518 13 in in IN 10_1101-2021_01_02_425006 518 14 language language NN 10_1101-2021_01_02_425006 518 15 polishing polishing NN 10_1101-2021_01_02_425006 518 16 . . . 10_1101-2021_01_02_425006 519 1 Authors author NNS 10_1101-2021_01_02_425006 519 2 ’ ’ POS 10_1101-2021_01_02_425006 519 3 contributions contribution NNS 10_1101-2021_01_02_425006 519 4 : : : 10_1101-2021_01_02_425006 519 5 B.L. B.L. NNP 10_1101-2021_01_02_425006 519 6 , , , 10_1101-2021_01_02_425006 519 7 581 581 CD 10_1101-2021_01_02_425006 519 8 Q.M. Q.M. NNP 10_1101-2021_01_02_425006 520 1 and and CC 10_1101-2021_01_02_425006 520 2 W.C. W.C. NNP 10_1101-2021_01_02_425006 520 3 conceived conceive VBD 10_1101-2021_01_02_425006 520 4 the the DT 10_1101-2021_01_02_425006 520 5 basic basic JJ 10_1101-2021_01_02_425006 520 6 idea idea NN 10_1101-2021_01_02_425006 520 7 and and CC 10_1101-2021_01_02_425006 520 8 designed design VBD 10_1101-2021_01_02_425006 520 9 the the DT 10_1101-2021_01_02_425006 520 10 overall overall JJ 10_1101-2021_01_02_425006 520 11 analyses analysis NNS 10_1101-2021_01_02_425006 520 12 . . . 10_1101-2021_01_02_425006 521 1 Q.W. Q.W. NNP 10_1101-2021_01_02_425006 522 1 carried carry VBN 10_1101-2021_01_02_425006 522 2 out out RP 10_1101-2021_01_02_425006 522 3 most most JJS 10_1101-2021_01_02_425006 522 4 of of IN 10_1101-2021_01_02_425006 522 5 the the DT 10_1101-2021_01_02_425006 522 6 582 582 CD 10_1101-2021_01_02_425006 522 7 computational computational JJ 10_1101-2021_01_02_425006 522 8 analysis analysis NN 10_1101-2021_01_02_425006 522 9 and and CC 10_1101-2021_01_02_425006 522 10 data datum NNS 10_1101-2021_01_02_425006 522 11 interpretation interpretation NN 10_1101-2021_01_02_425006 522 12 . . . 10_1101-2021_01_02_425006 523 1 All all PDT 10_1101-2021_01_02_425006 523 2 the the DT 10_1101-2021_01_02_425006 523 3 authors author NNS 10_1101-2021_01_02_425006 523 4 wrote write VBD 10_1101-2021_01_02_425006 523 5 the the DT 10_1101-2021_01_02_425006 523 6 manuscript manuscript NN 10_1101-2021_01_02_425006 523 7 . . . 10_1101-2021_01_02_425006 524 1 Competing compete VBG 10_1101-2021_01_02_425006 524 2 583 583 CD 10_1101-2021_01_02_425006 524 3 .CC .CC , 10_1101-2021_01_02_425006 524 4 - - HYPH 10_1101-2021_01_02_425006 524 5 BY by IN 10_1101-2021_01_02_425006 524 6 - - HYPH 10_1101-2021_01_02_425006 524 7 NC NC NNP 10_1101-2021_01_02_425006 524 8 - - HYPH 10_1101-2021_01_02_425006 524 9 ND ND NNP 10_1101-2021_01_02_425006 524 10 4.0 4.0 CD 10_1101-2021_01_02_425006 524 11 International International NNP 10_1101-2021_01_02_425006 524 12 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 524 13 under under IN 10_1101-2021_01_02_425006 524 14 a a DT 10_1101-2021_01_02_425006 524 15 ( ( -LRB- 10_1101-2021_01_02_425006 524 16 which which WDT 10_1101-2021_01_02_425006 524 17 was be VBD 10_1101-2021_01_02_425006 524 18 not not RB 10_1101-2021_01_02_425006 524 19 certified certify VBN 10_1101-2021_01_02_425006 524 20 by by IN 10_1101-2021_01_02_425006 524 21 peer peer NN 10_1101-2021_01_02_425006 524 22 review review NN 10_1101-2021_01_02_425006 524 23 ) ) -RRB- 10_1101-2021_01_02_425006 524 24 is be VBZ 10_1101-2021_01_02_425006 524 25 the the DT 10_1101-2021_01_02_425006 524 26 author author NN 10_1101-2021_01_02_425006 524 27 / / SYM 10_1101-2021_01_02_425006 524 28 funder funder NN 10_1101-2021_01_02_425006 524 29 , , , 10_1101-2021_01_02_425006 524 30 who who WP 10_1101-2021_01_02_425006 524 31 has have VBZ 10_1101-2021_01_02_425006 524 32 granted grant VBN 10_1101-2021_01_02_425006 524 33 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 524 34 a a DT 10_1101-2021_01_02_425006 524 35 license license NN 10_1101-2021_01_02_425006 524 36 to to TO 10_1101-2021_01_02_425006 524 37 display display VB 10_1101-2021_01_02_425006 524 38 the the DT 10_1101-2021_01_02_425006 524 39 preprint preprint NN 10_1101-2021_01_02_425006 524 40 in in IN 10_1101-2021_01_02_425006 524 41 perpetuity perpetuity NN 10_1101-2021_01_02_425006 524 42 . . . 10_1101-2021_01_02_425006 525 1 It -PRON- PRP 10_1101-2021_01_02_425006 525 2 is be VBZ 10_1101-2021_01_02_425006 525 3 made make VBN 10_1101-2021_01_02_425006 525 4 The the DT 10_1101-2021_01_02_425006 525 5 copyright copyright NN 10_1101-2021_01_02_425006 525 6 holder holder NN 10_1101-2021_01_02_425006 525 7 for for IN 10_1101-2021_01_02_425006 525 8 this this DT 10_1101-2021_01_02_425006 525 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 525 10 version version NN 10_1101-2021_01_02_425006 525 11 posted post VBD 10_1101-2021_01_02_425006 525 12 January January NNP 10_1101-2021_01_02_425006 525 13 6 6 CD 10_1101-2021_01_02_425006 525 14 , , , 10_1101-2021_01_02_425006 525 15 2021 2021 CD 10_1101-2021_01_02_425006 525 16 . . . 10_1101-2021_01_02_425006 525 17 ; ; : 10_1101-2021_01_02_425006 525 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 525 19 : : : 10_1101-2021_01_02_425006 525 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 525 21 preprint preprint NN 10_1101-2021_01_02_425006 525 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 525 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 525 24 34 34 CD 10_1101-2021_01_02_425006 525 25 interests interest NNS 10_1101-2021_01_02_425006 525 26 : : : 10_1101-2021_01_02_425006 525 27 The the DT 10_1101-2021_01_02_425006 525 28 authors author NNS 10_1101-2021_01_02_425006 525 29 declare declare VBP 10_1101-2021_01_02_425006 525 30 that that IN 10_1101-2021_01_02_425006 525 31 they -PRON- PRP 10_1101-2021_01_02_425006 525 32 have have VBP 10_1101-2021_01_02_425006 525 33 no no DT 10_1101-2021_01_02_425006 525 34 competing compete VBG 10_1101-2021_01_02_425006 525 35 interests interest NNS 10_1101-2021_01_02_425006 525 36 . . . 10_1101-2021_01_02_425006 526 1 Data datum NNS 10_1101-2021_01_02_425006 526 2 and and CC 10_1101-2021_01_02_425006 526 3 materials material NNS 10_1101-2021_01_02_425006 526 4 availability availability NN 10_1101-2021_01_02_425006 526 5 : : : 10_1101-2021_01_02_425006 526 6 584 584 CD 10_1101-2021_01_02_425006 526 7 The the DT 10_1101-2021_01_02_425006 526 8 raw raw JJ 10_1101-2021_01_02_425006 526 9 data datum NNS 10_1101-2021_01_02_425006 526 10 and and CC 10_1101-2021_01_02_425006 526 11 source source NN 10_1101-2021_01_02_425006 526 12 code code NN 10_1101-2021_01_02_425006 526 13 of of IN 10_1101-2021_01_02_425006 526 14 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 526 15 and and CC 10_1101-2021_01_02_425006 526 16 a a DT 10_1101-2021_01_02_425006 526 17 detailed detailed JJ 10_1101-2021_01_02_425006 526 18 tutorial tutorial NN 10_1101-2021_01_02_425006 526 19 can can MD 10_1101-2021_01_02_425006 526 20 be be VB 10_1101-2021_01_02_425006 526 21 found find VBN 10_1101-2021_01_02_425006 526 22 at at IN 10_1101-2021_01_02_425006 526 23 585 585 CD 10_1101-2021_01_02_425006 526 24 https://github.com/OSU-BMBL/SeqATU https://github.com/OSU-BMBL/SeqATU NNS 10_1101-2021_01_02_425006 526 25 . . . 10_1101-2021_01_02_425006 527 1 586 586 CD 10_1101-2021_01_02_425006 527 2 FIGURES figure NNS 10_1101-2021_01_02_425006 527 3 AND and CC 10_1101-2021_01_02_425006 527 4 TABLES tables NN 10_1101-2021_01_02_425006 527 5 587 587 CD 10_1101-2021_01_02_425006 527 6 Table table NN 10_1101-2021_01_02_425006 527 7 1 1 CD 10_1101-2021_01_02_425006 527 8 . . . 10_1101-2021_01_02_425006 528 1 Results result NNS 10_1101-2021_01_02_425006 528 2 of of IN 10_1101-2021_01_02_425006 528 3 predicted predict VBN 10_1101-2021_01_02_425006 528 4 ATUs atu NNS 10_1101-2021_01_02_425006 528 5 verified verify VBN 10_1101-2021_01_02_425006 528 6 by by IN 10_1101-2021_01_02_425006 528 7 experimental experimental JJ 10_1101-2021_01_02_425006 528 8 TSSs tss NNS 10_1101-2021_01_02_425006 528 9 or or CC 10_1101-2021_01_02_425006 528 10 TF TF NNP 10_1101-2021_01_02_425006 528 11 binding bind VBG 10_1101-2021_01_02_425006 528 12 sites site NNS 10_1101-2021_01_02_425006 528 13 . . . 10_1101-2021_01_02_425006 529 1 Overview overview NN 10_1101-2021_01_02_425006 529 2 of of IN 10_1101-2021_01_02_425006 529 3 588 588 CD 10_1101-2021_01_02_425006 529 4 the the DT 10_1101-2021_01_02_425006 529 5 experimental experimental JJ 10_1101-2021_01_02_425006 529 6 TSS TSS NNP 10_1101-2021_01_02_425006 529 7 and and CC 10_1101-2021_01_02_425006 529 8 TF TF NNP 10_1101-2021_01_02_425006 529 9 binding bind VBG 10_1101-2021_01_02_425006 529 10 site site NN 10_1101-2021_01_02_425006 529 11 datasets dataset NNS 10_1101-2021_01_02_425006 529 12 ( ( -LRB- 10_1101-2021_01_02_425006 529 13 dataset dataset NNP 10_1101-2021_01_02_425006 529 14 1 1 CD 10_1101-2021_01_02_425006 529 15 and and CC 10_1101-2021_01_02_425006 529 16 dataset dataset NNP 10_1101-2021_01_02_425006 529 17 2 2 CD 10_1101-2021_01_02_425006 529 18 ) ) -RRB- 10_1101-2021_01_02_425006 529 19 and and CC 10_1101-2021_01_02_425006 529 20 the the DT 10_1101-2021_01_02_425006 529 21 proportion proportion NN 10_1101-2021_01_02_425006 529 22 of of IN 10_1101-2021_01_02_425006 529 23 5’-end 5’-end CD 10_1101-2021_01_02_425006 529 24 589 589 CD 10_1101-2021_01_02_425006 529 25 genes gene NNS 10_1101-2021_01_02_425006 529 26 and and CC 10_1101-2021_01_02_425006 529 27 no no DT 10_1101-2021_01_02_425006 529 28 5’-end 5’-end CD 10_1101-2021_01_02_425006 529 29 genes gene NNS 10_1101-2021_01_02_425006 529 30 of of IN 10_1101-2021_01_02_425006 529 31 the the DT 10_1101-2021_01_02_425006 529 32 predicted predict VBN 10_1101-2021_01_02_425006 529 33 ATUs atu NNS 10_1101-2021_01_02_425006 529 34 by by IN 10_1101-2021_01_02_425006 529 35 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 529 36 for for IN 10_1101-2021_01_02_425006 529 37 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 529 38 and and CC 10_1101-2021_01_02_425006 529 39 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 529 40 , , , 10_1101-2021_01_02_425006 529 41 which which WDT 10_1101-2021_01_02_425006 529 42 590 590 CD 10_1101-2021_01_02_425006 529 43 were be VBD 10_1101-2021_01_02_425006 529 44 validated validate VBN 10_1101-2021_01_02_425006 529 45 by by IN 10_1101-2021_01_02_425006 529 46 experimental experimental JJ 10_1101-2021_01_02_425006 529 47 TSSs tss NNS 10_1101-2021_01_02_425006 529 48 or or CC 10_1101-2021_01_02_425006 529 49 TF TF NNP 10_1101-2021_01_02_425006 529 50 binding bind VBG 10_1101-2021_01_02_425006 529 51 sites site NNS 10_1101-2021_01_02_425006 529 52 . . . 10_1101-2021_01_02_425006 530 1 591 591 CD 10_1101-2021_01_02_425006 530 2 dataset dataset VBD 10_1101-2021_01_02_425006 530 3 1 1 CD 10_1101-2021_01_02_425006 530 4 dataset dataset NN 10_1101-2021_01_02_425006 530 5 2 2 CD 10_1101-2021_01_02_425006 530 6 Source Source NNP 10_1101-2021_01_02_425006 530 7 Ju Ju NNP 10_1101-2021_01_02_425006 530 8 et et NNP 10_1101-2021_01_02_425006 530 9 al al NNP 10_1101-2021_01_02_425006 530 10 . . . 10_1101-2021_01_02_425006 531 1 ( ( -LRB- 10_1101-2021_01_02_425006 531 2 7 7 LS 10_1101-2021_01_02_425006 531 3 ) ) -RRB- 10_1101-2021_01_02_425006 531 4 RegulonDB regulondb NN 10_1101-2021_01_02_425006 531 5 TF TF NNP 10_1101-2021_01_02_425006 531 6 binding bind VBG 10_1101-2021_01_02_425006 531 7 sites site NNS 10_1101-2021_01_02_425006 531 8 Technique Technique NNP 10_1101-2021_01_02_425006 531 9 SEnd SEnd NNP 10_1101-2021_01_02_425006 531 10 - - HYPH 10_1101-2021_01_02_425006 531 11 seq seq NNP 10_1101-2021_01_02_425006 531 12 Collection Collection NNP 10_1101-2021_01_02_425006 531 13 TSSs TSSs NNP 10_1101-2021_01_02_425006 531 14 / / SYM 10_1101-2021_01_02_425006 531 15 TF TF NNP 10_1101-2021_01_02_425006 531 16 binding bind VBG 10_1101-2021_01_02_425006 531 17 sites site NNS 10_1101-2021_01_02_425006 531 18 5,512 5,512 CD 10_1101-2021_01_02_425006 531 19 3,220 3,220 CD 10_1101-2021_01_02_425006 531 20 M9Enrich_Se M9Enrich_Se NNP 10_1101-2021_01_02_425006 531 21 q q NNP 10_1101-2021_01_02_425006 531 22 5’-end 5’-end CD 10_1101-2021_01_02_425006 531 23 genes gene NNS 10_1101-2021_01_02_425006 531 24 83 83 CD 10_1101-2021_01_02_425006 531 25 % % NN 10_1101-2021_01_02_425006 531 26 29 29 CD 10_1101-2021_01_02_425006 531 27 % % NN 10_1101-2021_01_02_425006 531 28 no no DT 10_1101-2021_01_02_425006 531 29 5’-end 5’-end CD 10_1101-2021_01_02_425006 531 30 genes gene NNS 10_1101-2021_01_02_425006 531 31 47 47 CD 10_1101-2021_01_02_425006 531 32 % % NN 10_1101-2021_01_02_425006 531 33 9.2 9.2 CD 10_1101-2021_01_02_425006 531 34 % % NN 10_1101-2021_01_02_425006 531 35 RiEnrich_Seq rienrich_seq NN 10_1101-2021_01_02_425006 531 36 5’-end 5’-end CD 10_1101-2021_01_02_425006 531 37 genes gene NNS 10_1101-2021_01_02_425006 531 38 89 89 CD 10_1101-2021_01_02_425006 531 39 % % NN 10_1101-2021_01_02_425006 531 40 30 30 CD 10_1101-2021_01_02_425006 531 41 % % NN 10_1101-2021_01_02_425006 531 42 no no DT 10_1101-2021_01_02_425006 531 43 5’-end 5’-end CD 10_1101-2021_01_02_425006 531 44 genes gene NNS 10_1101-2021_01_02_425006 531 45 44 44 CD 10_1101-2021_01_02_425006 531 46 % % NN 10_1101-2021_01_02_425006 531 47 9.0 9.0 CD 10_1101-2021_01_02_425006 531 48 % % NN 10_1101-2021_01_02_425006 531 49 592 592 CD 10_1101-2021_01_02_425006 531 50 593 593 CD 10_1101-2021_01_02_425006 531 51 Table table NN 10_1101-2021_01_02_425006 531 52 2 2 CD 10_1101-2021_01_02_425006 531 53 . . . 10_1101-2021_01_02_425006 532 1 Results result NNS 10_1101-2021_01_02_425006 532 2 of of IN 10_1101-2021_01_02_425006 532 3 predicted predict VBN 10_1101-2021_01_02_425006 532 4 ATUs atu NNS 10_1101-2021_01_02_425006 532 5 verified verify VBN 10_1101-2021_01_02_425006 532 6 by by IN 10_1101-2021_01_02_425006 532 7 experimental experimental JJ 10_1101-2021_01_02_425006 532 8 TTSs tts NNS 10_1101-2021_01_02_425006 532 9 . . . 10_1101-2021_01_02_425006 533 1 Overview overview NN 10_1101-2021_01_02_425006 533 2 of of IN 10_1101-2021_01_02_425006 533 3 the the DT 10_1101-2021_01_02_425006 533 4 experimental experimental JJ 10_1101-2021_01_02_425006 533 5 594 594 CD 10_1101-2021_01_02_425006 533 6 TTS TTS NNP 10_1101-2021_01_02_425006 533 7 datasets dataset NNS 10_1101-2021_01_02_425006 533 8 ( ( -LRB- 10_1101-2021_01_02_425006 533 9 dataset dataset NNP 10_1101-2021_01_02_425006 533 10 3 3 CD 10_1101-2021_01_02_425006 533 11 and and CC 10_1101-2021_01_02_425006 533 12 dataset dataset VBD 10_1101-2021_01_02_425006 533 13 4 4 CD 10_1101-2021_01_02_425006 533 14 ) ) -RRB- 10_1101-2021_01_02_425006 533 15 and and CC 10_1101-2021_01_02_425006 533 16 the the DT 10_1101-2021_01_02_425006 533 17 proportion proportion NN 10_1101-2021_01_02_425006 533 18 of of IN 10_1101-2021_01_02_425006 533 19 3’-end 3’-end CD 10_1101-2021_01_02_425006 533 20 genes gene NNS 10_1101-2021_01_02_425006 533 21 and and CC 10_1101-2021_01_02_425006 533 22 no no DT 10_1101-2021_01_02_425006 533 23 3’-end 3’-end CD 10_1101-2021_01_02_425006 533 24 genes gene NNS 10_1101-2021_01_02_425006 533 25 of of IN 10_1101-2021_01_02_425006 533 26 the the DT 10_1101-2021_01_02_425006 533 27 595 595 CD 10_1101-2021_01_02_425006 533 28 predicted predict VBN 10_1101-2021_01_02_425006 533 29 ATUs atu NNS 10_1101-2021_01_02_425006 533 30 by by IN 10_1101-2021_01_02_425006 533 31 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 533 32 for for IN 10_1101-2021_01_02_425006 533 33 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 533 34 and and CC 10_1101-2021_01_02_425006 533 35 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 533 36 , , , 10_1101-2021_01_02_425006 533 37 which which WDT 10_1101-2021_01_02_425006 533 38 were be VBD 10_1101-2021_01_02_425006 533 39 validated validate VBN 10_1101-2021_01_02_425006 533 40 by by IN 10_1101-2021_01_02_425006 533 41 596 596 CD 10_1101-2021_01_02_425006 533 42 experimental experimental JJ 10_1101-2021_01_02_425006 533 43 TTSs tts NNS 10_1101-2021_01_02_425006 533 44 . . . 10_1101-2021_01_02_425006 534 1 597 597 CD 10_1101-2021_01_02_425006 534 2 .CC .CC : 10_1101-2021_01_02_425006 534 3 - - HYPH 10_1101-2021_01_02_425006 534 4 BY by IN 10_1101-2021_01_02_425006 534 5 - - HYPH 10_1101-2021_01_02_425006 534 6 NC NC NNP 10_1101-2021_01_02_425006 534 7 - - HYPH 10_1101-2021_01_02_425006 534 8 ND ND NNP 10_1101-2021_01_02_425006 534 9 4.0 4.0 CD 10_1101-2021_01_02_425006 534 10 International International NNP 10_1101-2021_01_02_425006 534 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 534 12 under under IN 10_1101-2021_01_02_425006 534 13 a a DT 10_1101-2021_01_02_425006 534 14 ( ( -LRB- 10_1101-2021_01_02_425006 534 15 which which WDT 10_1101-2021_01_02_425006 534 16 was be VBD 10_1101-2021_01_02_425006 534 17 not not RB 10_1101-2021_01_02_425006 534 18 certified certify VBN 10_1101-2021_01_02_425006 534 19 by by IN 10_1101-2021_01_02_425006 534 20 peer peer NN 10_1101-2021_01_02_425006 534 21 review review NN 10_1101-2021_01_02_425006 534 22 ) ) -RRB- 10_1101-2021_01_02_425006 534 23 is be VBZ 10_1101-2021_01_02_425006 534 24 the the DT 10_1101-2021_01_02_425006 534 25 author author NN 10_1101-2021_01_02_425006 534 26 / / SYM 10_1101-2021_01_02_425006 534 27 funder funder NN 10_1101-2021_01_02_425006 534 28 , , , 10_1101-2021_01_02_425006 534 29 who who WP 10_1101-2021_01_02_425006 534 30 has have VBZ 10_1101-2021_01_02_425006 534 31 granted grant VBN 10_1101-2021_01_02_425006 534 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 534 33 a a DT 10_1101-2021_01_02_425006 534 34 license license NN 10_1101-2021_01_02_425006 534 35 to to TO 10_1101-2021_01_02_425006 534 36 display display VB 10_1101-2021_01_02_425006 534 37 the the DT 10_1101-2021_01_02_425006 534 38 preprint preprint NN 10_1101-2021_01_02_425006 534 39 in in IN 10_1101-2021_01_02_425006 534 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 534 41 . . . 10_1101-2021_01_02_425006 535 1 It -PRON- PRP 10_1101-2021_01_02_425006 535 2 is be VBZ 10_1101-2021_01_02_425006 535 3 made make VBN 10_1101-2021_01_02_425006 535 4 The the DT 10_1101-2021_01_02_425006 535 5 copyright copyright NN 10_1101-2021_01_02_425006 535 6 holder holder NN 10_1101-2021_01_02_425006 535 7 for for IN 10_1101-2021_01_02_425006 535 8 this this DT 10_1101-2021_01_02_425006 535 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 535 10 version version NN 10_1101-2021_01_02_425006 535 11 posted post VBD 10_1101-2021_01_02_425006 535 12 January January NNP 10_1101-2021_01_02_425006 535 13 6 6 CD 10_1101-2021_01_02_425006 535 14 , , , 10_1101-2021_01_02_425006 535 15 2021 2021 CD 10_1101-2021_01_02_425006 535 16 . . . 10_1101-2021_01_02_425006 535 17 ; ; : 10_1101-2021_01_02_425006 535 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 535 19 : : : 10_1101-2021_01_02_425006 535 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 535 21 preprint preprint NN 10_1101-2021_01_02_425006 535 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 535 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 535 24 35 35 CD 10_1101-2021_01_02_425006 535 25 dataset dataset NN 10_1101-2021_01_02_425006 535 26 3 3 CD 10_1101-2021_01_02_425006 535 27 dataset dataset NN 10_1101-2021_01_02_425006 535 28 4 4 CD 10_1101-2021_01_02_425006 535 29 Source Source NNP 10_1101-2021_01_02_425006 535 30 Ju Ju NNP 10_1101-2021_01_02_425006 535 31 et et NNP 10_1101-2021_01_02_425006 535 32 al al NNP 10_1101-2021_01_02_425006 535 33 . . . 10_1101-2021_01_02_425006 536 1 ( ( -LRB- 10_1101-2021_01_02_425006 536 2 7 7 LS 10_1101-2021_01_02_425006 536 3 ) ) -RRB- 10_1101-2021_01_02_425006 536 4 RegulonDB regulondb NN 10_1101-2021_01_02_425006 536 5 TTSs TTSs NNPS 10_1101-2021_01_02_425006 536 6 Technique Technique NNP 10_1101-2021_01_02_425006 536 7 SEnd SEnd NNP 10_1101-2021_01_02_425006 536 8 - - HYPH 10_1101-2021_01_02_425006 536 9 seq seq NNP 10_1101-2021_01_02_425006 536 10 Collection Collection NNP 10_1101-2021_01_02_425006 536 11 TTSs TTSs NNPS 10_1101-2021_01_02_425006 536 12 1,540 1,540 CD 10_1101-2021_01_02_425006 536 13 3,67 3,67 CD 10_1101-2021_01_02_425006 536 14 M9Enrich_Se M9Enrich_Se NNP 10_1101-2021_01_02_425006 536 15 q q NNP 10_1101-2021_01_02_425006 536 16 3’-end 3’-end CD 10_1101-2021_01_02_425006 536 17 genes gene NNS 10_1101-2021_01_02_425006 536 18 51 51 CD 10_1101-2021_01_02_425006 536 19 % % NN 10_1101-2021_01_02_425006 536 20 11 11 CD 10_1101-2021_01_02_425006 536 21 % % NN 10_1101-2021_01_02_425006 536 22 no no DT 10_1101-2021_01_02_425006 536 23 3’-end 3’-end CD 10_1101-2021_01_02_425006 536 24 genes gene NNS 10_1101-2021_01_02_425006 536 25 15 15 CD 10_1101-2021_01_02_425006 536 26 % % NN 10_1101-2021_01_02_425006 536 27 5.2 5.2 CD 10_1101-2021_01_02_425006 536 28 % % NN 10_1101-2021_01_02_425006 536 29 RiEnrich_Seq rienrich_seq NN 10_1101-2021_01_02_425006 536 30 3’-end 3’-end CD 10_1101-2021_01_02_425006 536 31 genes gene NNS 10_1101-2021_01_02_425006 536 32 53 53 CD 10_1101-2021_01_02_425006 536 33 % % NN 10_1101-2021_01_02_425006 536 34 11 11 CD 10_1101-2021_01_02_425006 536 35 % % NN 10_1101-2021_01_02_425006 536 36 no no DT 10_1101-2021_01_02_425006 536 37 3’-end 3’-end CD 10_1101-2021_01_02_425006 536 38 genes gene NNS 10_1101-2021_01_02_425006 536 39 14 14 CD 10_1101-2021_01_02_425006 536 40 % % NN 10_1101-2021_01_02_425006 536 41 4.8 4.8 CD 10_1101-2021_01_02_425006 536 42 % % NN 10_1101-2021_01_02_425006 536 43 598 598 CD 10_1101-2021_01_02_425006 536 44 599 599 CD 10_1101-2021_01_02_425006 536 45 600 600 CD 10_1101-2021_01_02_425006 536 46 Fig Fig NNP 10_1101-2021_01_02_425006 536 47 . . . 10_1101-2021_01_02_425006 537 1 1 1 LS 10_1101-2021_01_02_425006 537 2 . . . 10_1101-2021_01_02_425006 538 1 Schematic schematic JJ 10_1101-2021_01_02_425006 538 2 overview overview NN 10_1101-2021_01_02_425006 538 3 of of IN 10_1101-2021_01_02_425006 538 4 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 538 5 . . . 10_1101-2021_01_02_425006 539 1 The the DT 10_1101-2021_01_02_425006 539 2 blue blue JJ 10_1101-2021_01_02_425006 539 3 arrow arrow NN 10_1101-2021_01_02_425006 539 4 and and CC 10_1101-2021_01_02_425006 539 5 orange orange JJ 10_1101-2021_01_02_425006 539 6 line line NN 10_1101-2021_01_02_425006 539 7 denote denote NN 10_1101-2021_01_02_425006 539 8 gene gene NN 10_1101-2021_01_02_425006 539 9 and and CC 10_1101-2021_01_02_425006 539 10 RNA RNA NNP 10_1101-2021_01_02_425006 539 11 - - HYPH 10_1101-2021_01_02_425006 539 12 Seq Seq NNP 10_1101-2021_01_02_425006 539 13 601 601 CD 10_1101-2021_01_02_425006 539 14 read read VBD 10_1101-2021_01_02_425006 539 15 , , , 10_1101-2021_01_02_425006 539 16 respectively respectively RB 10_1101-2021_01_02_425006 539 17 . . . 10_1101-2021_01_02_425006 540 1 The the DT 10_1101-2021_01_02_425006 540 2 preprocessing preprocessing NN 10_1101-2021_01_02_425006 540 3 stage stage NN 10_1101-2021_01_02_425006 540 4 requires require VBZ 10_1101-2021_01_02_425006 540 5 RNA RNA NNP 10_1101-2021_01_02_425006 540 6 - - HYPH 10_1101-2021_01_02_425006 540 7 Seq Seq NNP 10_1101-2021_01_02_425006 540 8 data datum NNS 10_1101-2021_01_02_425006 540 9 in in IN 10_1101-2021_01_02_425006 540 10 the the DT 10_1101-2021_01_02_425006 540 11 FASTQ FASTQ NNP 10_1101-2021_01_02_425006 540 12 format format NN 10_1101-2021_01_02_425006 540 13 , , , 10_1101-2021_01_02_425006 540 14 the the DT 10_1101-2021_01_02_425006 540 15 reference reference NN 10_1101-2021_01_02_425006 540 16 602 602 CD 10_1101-2021_01_02_425006 540 17 .CC .CC : 10_1101-2021_01_02_425006 540 18 - - HYPH 10_1101-2021_01_02_425006 540 19 BY by IN 10_1101-2021_01_02_425006 540 20 - - HYPH 10_1101-2021_01_02_425006 540 21 NC NC NNP 10_1101-2021_01_02_425006 540 22 - - HYPH 10_1101-2021_01_02_425006 540 23 ND ND NNP 10_1101-2021_01_02_425006 540 24 4.0 4.0 CD 10_1101-2021_01_02_425006 540 25 International International NNP 10_1101-2021_01_02_425006 540 26 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 540 27 under under IN 10_1101-2021_01_02_425006 540 28 a a DT 10_1101-2021_01_02_425006 540 29 ( ( -LRB- 10_1101-2021_01_02_425006 540 30 which which WDT 10_1101-2021_01_02_425006 540 31 was be VBD 10_1101-2021_01_02_425006 540 32 not not RB 10_1101-2021_01_02_425006 540 33 certified certify VBN 10_1101-2021_01_02_425006 540 34 by by IN 10_1101-2021_01_02_425006 540 35 peer peer NN 10_1101-2021_01_02_425006 540 36 review review NN 10_1101-2021_01_02_425006 540 37 ) ) -RRB- 10_1101-2021_01_02_425006 540 38 is be VBZ 10_1101-2021_01_02_425006 540 39 the the DT 10_1101-2021_01_02_425006 540 40 author author NN 10_1101-2021_01_02_425006 540 41 / / SYM 10_1101-2021_01_02_425006 540 42 funder funder NN 10_1101-2021_01_02_425006 540 43 , , , 10_1101-2021_01_02_425006 540 44 who who WP 10_1101-2021_01_02_425006 540 45 has have VBZ 10_1101-2021_01_02_425006 540 46 granted grant VBN 10_1101-2021_01_02_425006 540 47 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 540 48 a a DT 10_1101-2021_01_02_425006 540 49 license license NN 10_1101-2021_01_02_425006 540 50 to to TO 10_1101-2021_01_02_425006 540 51 display display VB 10_1101-2021_01_02_425006 540 52 the the DT 10_1101-2021_01_02_425006 540 53 preprint preprint NN 10_1101-2021_01_02_425006 540 54 in in IN 10_1101-2021_01_02_425006 540 55 perpetuity perpetuity NN 10_1101-2021_01_02_425006 540 56 . . . 10_1101-2021_01_02_425006 541 1 It -PRON- PRP 10_1101-2021_01_02_425006 541 2 is be VBZ 10_1101-2021_01_02_425006 541 3 made make VBN 10_1101-2021_01_02_425006 541 4 The the DT 10_1101-2021_01_02_425006 541 5 copyright copyright NN 10_1101-2021_01_02_425006 541 6 holder holder NN 10_1101-2021_01_02_425006 541 7 for for IN 10_1101-2021_01_02_425006 541 8 this this DT 10_1101-2021_01_02_425006 541 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 541 10 version version NN 10_1101-2021_01_02_425006 541 11 posted post VBD 10_1101-2021_01_02_425006 541 12 January January NNP 10_1101-2021_01_02_425006 541 13 6 6 CD 10_1101-2021_01_02_425006 541 14 , , , 10_1101-2021_01_02_425006 541 15 2021 2021 CD 10_1101-2021_01_02_425006 541 16 . . . 10_1101-2021_01_02_425006 541 17 ; ; : 10_1101-2021_01_02_425006 541 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 541 19 : : : 10_1101-2021_01_02_425006 541 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 541 21 preprint preprint NN 10_1101-2021_01_02_425006 541 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 541 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 541 24 36 36 CD 10_1101-2021_01_02_425006 541 25 genome genome JJ 10_1101-2021_01_02_425006 541 26 sequence sequence NN 10_1101-2021_01_02_425006 541 27 in in IN 10_1101-2021_01_02_425006 541 28 the the DT 10_1101-2021_01_02_425006 541 29 FASTA FASTA NNP 10_1101-2021_01_02_425006 541 30 format format NN 10_1101-2021_01_02_425006 541 31 , , , 10_1101-2021_01_02_425006 541 32 and and CC 10_1101-2021_01_02_425006 541 33 gene gene NN 10_1101-2021_01_02_425006 541 34 annotations annotation NNS 10_1101-2021_01_02_425006 541 35 in in IN 10_1101-2021_01_02_425006 541 36 the the DT 10_1101-2021_01_02_425006 541 37 GFF GFF NNP 10_1101-2021_01_02_425006 541 38 format format NN 10_1101-2021_01_02_425006 541 39 , , , 10_1101-2021_01_02_425006 541 40 generating generate VBG 10_1101-2021_01_02_425006 541 41 linear linear JJ 10_1101-2021_01_02_425006 541 42 603 603 CD 10_1101-2021_01_02_425006 541 43 constraints constraint NNS 10_1101-2021_01_02_425006 541 44 for for IN 10_1101-2021_01_02_425006 541 45 the the DT 10_1101-2021_01_02_425006 541 46 next next JJ 10_1101-2021_01_02_425006 541 47 convex convex NNP 10_1101-2021_01_02_425006 541 48 quadratic quadratic NNP 10_1101-2021_01_02_425006 541 49 programming programming NN 10_1101-2021_01_02_425006 541 50 ( ( -LRB- 10_1101-2021_01_02_425006 541 51 CQP CQP NNP 10_1101-2021_01_02_425006 541 52 ) ) -RRB- 10_1101-2021_01_02_425006 541 53 stage stage NN 10_1101-2021_01_02_425006 541 54 . . . 10_1101-2021_01_02_425006 542 1 There there EX 10_1101-2021_01_02_425006 542 2 are be VBP 10_1101-2021_01_02_425006 542 3 two two CD 10_1101-2021_01_02_425006 542 4 steps step NNS 10_1101-2021_01_02_425006 542 5 in in IN 10_1101-2021_01_02_425006 542 6 the the DT 10_1101-2021_01_02_425006 542 7 604 604 CD 10_1101-2021_01_02_425006 542 8 preprocessing preprocessing NN 10_1101-2021_01_02_425006 542 9 stage stage NN 10_1101-2021_01_02_425006 542 10 : : : 10_1101-2021_01_02_425006 542 11 ( ( -LRB- 10_1101-2021_01_02_425006 542 12 i i NN 10_1101-2021_01_02_425006 542 13 ) ) -RRB- 10_1101-2021_01_02_425006 542 14 calculating calculate VBG 10_1101-2021_01_02_425006 542 15 the the DT 10_1101-2021_01_02_425006 542 16 expression expression NN 10_1101-2021_01_02_425006 542 17 value value NN 10_1101-2021_01_02_425006 542 18 of of IN 10_1101-2021_01_02_425006 542 19 the the DT 10_1101-2021_01_02_425006 542 20 genetic genetic JJ 10_1101-2021_01_02_425006 542 21 region region NN 10_1101-2021_01_02_425006 542 22 � � . 10_1101-2021_01_02_425006 542 23 � � NN 10_1101-2021_01_02_425006 542 24 and and CC 10_1101-2021_01_02_425006 542 25 intergenic intergenic JJ 10_1101-2021_01_02_425006 542 26 region region NN 10_1101-2021_01_02_425006 542 27 605 605 CD 10_1101-2021_01_02_425006 542 28 � � NNP 10_1101-2021_01_02_425006 542 29 � � NNP 10_1101-2021_01_02_425006 542 30 , , , 10_1101-2021_01_02_425006 542 31 � � NNP 10_1101-2021_01_02_425006 542 32 and and CC 10_1101-2021_01_02_425006 542 33 ( ( -LRB- 10_1101-2021_01_02_425006 542 34 ii ii NNP 10_1101-2021_01_02_425006 542 35 ) ) -RRB- 10_1101-2021_01_02_425006 542 36 modelling model VBG 10_1101-2021_01_02_425006 542 37 non non JJ 10_1101-2021_01_02_425006 542 38 - - JJ 10_1101-2021_01_02_425006 542 39 uniform uniform JJ 10_1101-2021_01_02_425006 542 40 read read VB 10_1101-2021_01_02_425006 542 41 distribution distribution NN 10_1101-2021_01_02_425006 542 42 along along IN 10_1101-2021_01_02_425006 542 43 mRNA mRNA NNS 10_1101-2021_01_02_425006 542 44 transcripts transcript NNS 10_1101-2021_01_02_425006 542 45 ; ; , 10_1101-2021_01_02_425006 542 46 specifically specifically RB 10_1101-2021_01_02_425006 542 47 , , , 10_1101-2021_01_02_425006 542 48 we -PRON- PRP 10_1101-2021_01_02_425006 542 49 acquired acquire VBD 10_1101-2021_01_02_425006 542 50 606 606 CD 10_1101-2021_01_02_425006 542 51 a a DT 10_1101-2021_01_02_425006 542 52 bias bias NN 10_1101-2021_01_02_425006 542 53 rate rate NN 10_1101-2021_01_02_425006 542 54 function function NN 10_1101-2021_01_02_425006 542 55 � � , 10_1101-2021_01_02_425006 542 56 ( ( -LRB- 10_1101-2021_01_02_425006 542 57 � � NNP 10_1101-2021_01_02_425006 542 58 ) ) -RRB- 10_1101-2021_01_02_425006 542 59 = = NFP 10_1101-2021_01_02_425006 542 60 � � ADD 10_1101-2021_01_02_425006 542 61 � � , 10_1101-2021_01_02_425006 542 62 � � NNP 10_1101-2021_01_02_425006 542 63 using use VBG 10_1101-2021_01_02_425006 542 64 nonlinear nonlinear JJ 10_1101-2021_01_02_425006 542 65 regression regression NN 10_1101-2021_01_02_425006 542 66 and and CC 10_1101-2021_01_02_425006 542 67 then then RB 10_1101-2021_01_02_425006 542 68 constructed construct VBN 10_1101-2021_01_02_425006 542 69 genetic genetic JJ 10_1101-2021_01_02_425006 542 70 or or CC 10_1101-2021_01_02_425006 542 71 intergenic intergenic JJ 10_1101-2021_01_02_425006 542 72 607 607 CD 10_1101-2021_01_02_425006 542 73 region region NN 10_1101-2021_01_02_425006 542 74 bias bias NN 10_1101-2021_01_02_425006 542 75 rate rate NN 10_1101-2021_01_02_425006 542 76 vectors vector NNS 10_1101-2021_01_02_425006 542 77 . . . 10_1101-2021_01_02_425006 543 1 The the DT 10_1101-2021_01_02_425006 543 2 maximal maximal JJ 10_1101-2021_01_02_425006 543 3 ATU ATU NNP 10_1101-2021_01_02_425006 543 4 cluster cluster NN 10_1101-2021_01_02_425006 543 5 data datum NNS 10_1101-2021_01_02_425006 543 6 determined determine VBN 10_1101-2021_01_02_425006 543 7 by by IN 10_1101-2021_01_02_425006 543 8 rSeqTU rseqtu CD 10_1101-2021_01_02_425006 543 9 and and CC 10_1101-2021_01_02_425006 543 10 the the DT 10_1101-2021_01_02_425006 543 11 linear linear JJ 10_1101-2021_01_02_425006 543 12 constraints constraint NNS 10_1101-2021_01_02_425006 543 13 608 608 CD 10_1101-2021_01_02_425006 543 14 from from IN 10_1101-2021_01_02_425006 543 15 preprocessing preprocesse VBG 10_1101-2021_01_02_425006 543 16 are be VBP 10_1101-2021_01_02_425006 543 17 both both DT 10_1101-2021_01_02_425006 543 18 taken take VBN 10_1101-2021_01_02_425006 543 19 as as IN 10_1101-2021_01_02_425006 543 20 inputs input NNS 10_1101-2021_01_02_425006 543 21 of of IN 10_1101-2021_01_02_425006 543 22 CQP CQP NNP 10_1101-2021_01_02_425006 543 23 . . . 10_1101-2021_01_02_425006 544 1 CQP CQP NNP 10_1101-2021_01_02_425006 544 2 seeks seek VBZ 10_1101-2021_01_02_425006 544 3 the the DT 10_1101-2021_01_02_425006 544 4 optimum optimum JJ 10_1101-2021_01_02_425006 544 5 expression expression NN 10_1101-2021_01_02_425006 544 6 combination combination NN 10_1101-2021_01_02_425006 544 7 of of IN 10_1101-2021_01_02_425006 544 8 609 609 CD 10_1101-2021_01_02_425006 544 9 all all DT 10_1101-2021_01_02_425006 544 10 of of IN 10_1101-2021_01_02_425006 544 11 the the DT 10_1101-2021_01_02_425006 544 12 to to TO 10_1101-2021_01_02_425006 544 13 - - HYPH 10_1101-2021_01_02_425006 544 14 be be VB 10_1101-2021_01_02_425006 544 15 - - HYPH 10_1101-2021_01_02_425006 544 16 identified identify VBN 10_1101-2021_01_02_425006 544 17 ATUs atu NNS 10_1101-2021_01_02_425006 544 18 to to TO 10_1101-2021_01_02_425006 544 19 minimize minimize VB 10_1101-2021_01_02_425006 544 20 the the DT 10_1101-2021_01_02_425006 544 21 gap gap NN 10_1101-2021_01_02_425006 544 22 � � NN 10_1101-2021_01_02_425006 544 23 � � NNP 10_1101-2021_01_02_425006 544 24 � � NNS 10_1101-2021_01_02_425006 544 25 between between IN 10_1101-2021_01_02_425006 544 26 the the DT 10_1101-2021_01_02_425006 544 27 predicted predict VBN 10_1101-2021_01_02_425006 544 28 ATU ATU NNP 10_1101-2021_01_02_425006 544 29 expression expression NN 10_1101-2021_01_02_425006 544 30 profile profile NN 10_1101-2021_01_02_425006 544 31 610 610 CD 10_1101-2021_01_02_425006 544 32 and and CC 10_1101-2021_01_02_425006 544 33 the the DT 10_1101-2021_01_02_425006 544 34 genetic genetic JJ 10_1101-2021_01_02_425006 544 35 and and CC 10_1101-2021_01_02_425006 544 36 intergenic intergenic JJ 10_1101-2021_01_02_425006 544 37 region region NN 10_1101-2021_01_02_425006 544 38 expression expression NN 10_1101-2021_01_02_425006 544 39 profile profile NN 10_1101-2021_01_02_425006 544 40 . . . 10_1101-2021_01_02_425006 545 1 Finally finally RB 10_1101-2021_01_02_425006 545 2 , , , 10_1101-2021_01_02_425006 545 3 the the DT 10_1101-2021_01_02_425006 545 4 output output NN 10_1101-2021_01_02_425006 545 5 of of IN 10_1101-2021_01_02_425006 545 6 CQP CQP NNP 10_1101-2021_01_02_425006 545 7 is be VBZ 10_1101-2021_01_02_425006 545 8 the the DT 10_1101-2021_01_02_425006 545 9 predicted predict VBN 10_1101-2021_01_02_425006 545 10 611 611 CD 10_1101-2021_01_02_425006 545 11 ATUs atu NNS 10_1101-2021_01_02_425006 545 12 . . . 10_1101-2021_01_02_425006 546 1 612 612 CD 10_1101-2021_01_02_425006 546 2 .CC .CC : 10_1101-2021_01_02_425006 546 3 - - HYPH 10_1101-2021_01_02_425006 546 4 BY by IN 10_1101-2021_01_02_425006 546 5 - - HYPH 10_1101-2021_01_02_425006 546 6 NC NC NNP 10_1101-2021_01_02_425006 546 7 - - HYPH 10_1101-2021_01_02_425006 546 8 ND ND NNP 10_1101-2021_01_02_425006 546 9 4.0 4.0 CD 10_1101-2021_01_02_425006 546 10 International International NNP 10_1101-2021_01_02_425006 546 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 546 12 under under IN 10_1101-2021_01_02_425006 546 13 a a DT 10_1101-2021_01_02_425006 546 14 ( ( -LRB- 10_1101-2021_01_02_425006 546 15 which which WDT 10_1101-2021_01_02_425006 546 16 was be VBD 10_1101-2021_01_02_425006 546 17 not not RB 10_1101-2021_01_02_425006 546 18 certified certify VBN 10_1101-2021_01_02_425006 546 19 by by IN 10_1101-2021_01_02_425006 546 20 peer peer NN 10_1101-2021_01_02_425006 546 21 review review NN 10_1101-2021_01_02_425006 546 22 ) ) -RRB- 10_1101-2021_01_02_425006 546 23 is be VBZ 10_1101-2021_01_02_425006 546 24 the the DT 10_1101-2021_01_02_425006 546 25 author author NN 10_1101-2021_01_02_425006 546 26 / / SYM 10_1101-2021_01_02_425006 546 27 funder funder NN 10_1101-2021_01_02_425006 546 28 , , , 10_1101-2021_01_02_425006 546 29 who who WP 10_1101-2021_01_02_425006 546 30 has have VBZ 10_1101-2021_01_02_425006 546 31 granted grant VBN 10_1101-2021_01_02_425006 546 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 546 33 a a DT 10_1101-2021_01_02_425006 546 34 license license NN 10_1101-2021_01_02_425006 546 35 to to TO 10_1101-2021_01_02_425006 546 36 display display VB 10_1101-2021_01_02_425006 546 37 the the DT 10_1101-2021_01_02_425006 546 38 preprint preprint NN 10_1101-2021_01_02_425006 546 39 in in IN 10_1101-2021_01_02_425006 546 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 546 41 . . . 10_1101-2021_01_02_425006 547 1 It -PRON- PRP 10_1101-2021_01_02_425006 547 2 is be VBZ 10_1101-2021_01_02_425006 547 3 made make VBN 10_1101-2021_01_02_425006 547 4 The the DT 10_1101-2021_01_02_425006 547 5 copyright copyright NN 10_1101-2021_01_02_425006 547 6 holder holder NN 10_1101-2021_01_02_425006 547 7 for for IN 10_1101-2021_01_02_425006 547 8 this this DT 10_1101-2021_01_02_425006 547 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 547 10 version version NN 10_1101-2021_01_02_425006 547 11 posted post VBD 10_1101-2021_01_02_425006 547 12 January January NNP 10_1101-2021_01_02_425006 547 13 6 6 CD 10_1101-2021_01_02_425006 547 14 , , , 10_1101-2021_01_02_425006 547 15 2021 2021 CD 10_1101-2021_01_02_425006 547 16 . . . 10_1101-2021_01_02_425006 547 17 ; ; : 10_1101-2021_01_02_425006 547 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 547 19 : : : 10_1101-2021_01_02_425006 547 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 547 21 preprint preprint NN 10_1101-2021_01_02_425006 547 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 547 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 547 24 37 37 CD 10_1101-2021_01_02_425006 547 25 613 613 CD 10_1101-2021_01_02_425006 547 26 Fig Fig NNP 10_1101-2021_01_02_425006 547 27 . . . 10_1101-2021_01_02_425006 548 1 2 2 LS 10_1101-2021_01_02_425006 548 2 . . . 10_1101-2021_01_02_425006 549 1 Results result NNS 10_1101-2021_01_02_425006 549 2 of of IN 10_1101-2021_01_02_425006 549 3 modelling model VBG 10_1101-2021_01_02_425006 549 4 non non JJ 10_1101-2021_01_02_425006 549 5 - - JJ 10_1101-2021_01_02_425006 549 6 uniform uniform JJ 10_1101-2021_01_02_425006 549 7 read read VB 10_1101-2021_01_02_425006 549 8 distribution distribution NN 10_1101-2021_01_02_425006 549 9 along along IN 10_1101-2021_01_02_425006 549 10 mRNA mRNA NNP 10_1101-2021_01_02_425006 549 11 transcripts transcript NNS 10_1101-2021_01_02_425006 549 12 . . . 10_1101-2021_01_02_425006 550 1 The the DT 10_1101-2021_01_02_425006 550 2 four four CD 10_1101-2021_01_02_425006 550 3 bias bias NN 10_1101-2021_01_02_425006 550 4 614 614 CD 10_1101-2021_01_02_425006 550 5 rate rate NN 10_1101-2021_01_02_425006 550 6 functions function NNS 10_1101-2021_01_02_425006 550 7 ( ( -LRB- 10_1101-2021_01_02_425006 550 8 � � NNP 10_1101-2021_01_02_425006 550 9 = = SYM 10_1101-2021_01_02_425006 550 10 � � NNP 10_1101-2021_01_02_425006 550 11 � � VBZ 10_1101-2021_01_02_425006 550 12 � � NNS 10_1101-2021_01_02_425006 550 13 � � NN 10_1101-2021_01_02_425006 550 14 ) ) -RRB- 10_1101-2021_01_02_425006 550 15 by by IN 10_1101-2021_01_02_425006 550 16 nonlinear nonlinear JJ 10_1101-2021_01_02_425006 550 17 regression regression NN 10_1101-2021_01_02_425006 550 18 had have VBD 10_1101-2021_01_02_425006 550 19 similar similar JJ 10_1101-2021_01_02_425006 550 20 coefficients coefficient NNS 10_1101-2021_01_02_425006 550 21 ( ( -LRB- 10_1101-2021_01_02_425006 550 22 � � NNP 10_1101-2021_01_02_425006 550 23 and and CC 10_1101-2021_01_02_425006 550 24 � � NNP 10_1101-2021_01_02_425006 550 25 ) ) -RRB- 10_1101-2021_01_02_425006 550 26 across across IN 10_1101-2021_01_02_425006 550 27 the the DT 10_1101-2021_01_02_425006 550 28 four four CD 10_1101-2021_01_02_425006 550 29 615 615 CD 10_1101-2021_01_02_425006 550 30 datasets dataset NNS 10_1101-2021_01_02_425006 550 31 M9Enrich_1 M9Enrich_1 NNP 10_1101-2021_01_02_425006 550 32 , , , 10_1101-2021_01_02_425006 550 33 M9Enrich_2 M9Enrich_2 NNP 10_1101-2021_01_02_425006 550 34 , , , 10_1101-2021_01_02_425006 550 35 RiEnrich_1 RiEnrich_1 NNP 10_1101-2021_01_02_425006 550 36 and and CC 10_1101-2021_01_02_425006 550 37 RiEnrich_2 RiEnrich_2 NNP 10_1101-2021_01_02_425006 550 38 . . . 10_1101-2021_01_02_425006 551 1 616 616 CD 10_1101-2021_01_02_425006 551 2 .CC .CC : 10_1101-2021_01_02_425006 551 3 - - HYPH 10_1101-2021_01_02_425006 551 4 BY by IN 10_1101-2021_01_02_425006 551 5 - - HYPH 10_1101-2021_01_02_425006 551 6 NC NC NNP 10_1101-2021_01_02_425006 551 7 - - HYPH 10_1101-2021_01_02_425006 551 8 ND ND NNP 10_1101-2021_01_02_425006 551 9 4.0 4.0 CD 10_1101-2021_01_02_425006 551 10 International International NNP 10_1101-2021_01_02_425006 551 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 551 12 under under IN 10_1101-2021_01_02_425006 551 13 a a DT 10_1101-2021_01_02_425006 551 14 ( ( -LRB- 10_1101-2021_01_02_425006 551 15 which which WDT 10_1101-2021_01_02_425006 551 16 was be VBD 10_1101-2021_01_02_425006 551 17 not not RB 10_1101-2021_01_02_425006 551 18 certified certify VBN 10_1101-2021_01_02_425006 551 19 by by IN 10_1101-2021_01_02_425006 551 20 peer peer NN 10_1101-2021_01_02_425006 551 21 review review NN 10_1101-2021_01_02_425006 551 22 ) ) -RRB- 10_1101-2021_01_02_425006 551 23 is be VBZ 10_1101-2021_01_02_425006 551 24 the the DT 10_1101-2021_01_02_425006 551 25 author author NN 10_1101-2021_01_02_425006 551 26 / / SYM 10_1101-2021_01_02_425006 551 27 funder funder NN 10_1101-2021_01_02_425006 551 28 , , , 10_1101-2021_01_02_425006 551 29 who who WP 10_1101-2021_01_02_425006 551 30 has have VBZ 10_1101-2021_01_02_425006 551 31 granted grant VBN 10_1101-2021_01_02_425006 551 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 551 33 a a DT 10_1101-2021_01_02_425006 551 34 license license NN 10_1101-2021_01_02_425006 551 35 to to TO 10_1101-2021_01_02_425006 551 36 display display VB 10_1101-2021_01_02_425006 551 37 the the DT 10_1101-2021_01_02_425006 551 38 preprint preprint NN 10_1101-2021_01_02_425006 551 39 in in IN 10_1101-2021_01_02_425006 551 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 551 41 . . . 10_1101-2021_01_02_425006 552 1 It -PRON- PRP 10_1101-2021_01_02_425006 552 2 is be VBZ 10_1101-2021_01_02_425006 552 3 made make VBN 10_1101-2021_01_02_425006 552 4 The the DT 10_1101-2021_01_02_425006 552 5 copyright copyright NN 10_1101-2021_01_02_425006 552 6 holder holder NN 10_1101-2021_01_02_425006 552 7 for for IN 10_1101-2021_01_02_425006 552 8 this this DT 10_1101-2021_01_02_425006 552 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 552 10 version version NN 10_1101-2021_01_02_425006 552 11 posted post VBD 10_1101-2021_01_02_425006 552 12 January January NNP 10_1101-2021_01_02_425006 552 13 6 6 CD 10_1101-2021_01_02_425006 552 14 , , , 10_1101-2021_01_02_425006 552 15 2021 2021 CD 10_1101-2021_01_02_425006 552 16 . . . 10_1101-2021_01_02_425006 552 17 ; ; : 10_1101-2021_01_02_425006 552 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 552 19 : : : 10_1101-2021_01_02_425006 552 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 552 21 preprint preprint NN 10_1101-2021_01_02_425006 552 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 552 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 552 24 38 38 CD 10_1101-2021_01_02_425006 552 25 617 617 CD 10_1101-2021_01_02_425006 552 26 Fig fig NN 10_1101-2021_01_02_425006 552 27 . . . 10_1101-2021_01_02_425006 553 1 3 3 LS 10_1101-2021_01_02_425006 553 2 . . . 10_1101-2021_01_02_425006 554 1 Overall overall JJ 10_1101-2021_01_02_425006 554 2 evaluation evaluation NN 10_1101-2021_01_02_425006 554 3 results result NNS 10_1101-2021_01_02_425006 554 4 of of IN 10_1101-2021_01_02_425006 554 5 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 554 6 . . . 10_1101-2021_01_02_425006 555 1 ( ( -LRB- 10_1101-2021_01_02_425006 555 2 A a DT 10_1101-2021_01_02_425006 555 3 ) ) -RRB- 10_1101-2021_01_02_425006 555 4 Precision precision NN 10_1101-2021_01_02_425006 555 5 and and CC 10_1101-2021_01_02_425006 555 6 recall recall NN 10_1101-2021_01_02_425006 555 7 based base VBN 10_1101-2021_01_02_425006 555 8 on on IN 10_1101-2021_01_02_425006 555 9 perfect perfect JJ 10_1101-2021_01_02_425006 555 10 matching matching NN 10_1101-2021_01_02_425006 555 11 and and CC 10_1101-2021_01_02_425006 555 12 618 618 CD 10_1101-2021_01_02_425006 555 13 relaxed relaxed JJ 10_1101-2021_01_02_425006 555 14 matching matching NN 10_1101-2021_01_02_425006 555 15 for for IN 10_1101-2021_01_02_425006 555 16 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 555 17 ( ( -LRB- 10_1101-2021_01_02_425006 555 18 left left RB 10_1101-2021_01_02_425006 555 19 ) ) -RRB- 10_1101-2021_01_02_425006 555 20 and and CC 10_1101-2021_01_02_425006 555 21 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 555 22 ( ( -LRB- 10_1101-2021_01_02_425006 555 23 right right UH 10_1101-2021_01_02_425006 555 24 ) ) -RRB- 10_1101-2021_01_02_425006 555 25 using use VBG 10_1101-2021_01_02_425006 555 26 evaluated evaluate VBN 10_1101-2021_01_02_425006 555 27 ATUs atu NNS 10_1101-2021_01_02_425006 555 28 from from IN 10_1101-2021_01_02_425006 555 29 SMRT-619 smrt-619 JJ 10_1101-2021_01_02_425006 555 30 .CC .CC NFP 10_1101-2021_01_02_425006 555 31 - - HYPH 10_1101-2021_01_02_425006 555 32 BY by IN 10_1101-2021_01_02_425006 555 33 - - HYPH 10_1101-2021_01_02_425006 555 34 NC NC NNP 10_1101-2021_01_02_425006 555 35 - - HYPH 10_1101-2021_01_02_425006 555 36 ND ND NNP 10_1101-2021_01_02_425006 555 37 4.0 4.0 CD 10_1101-2021_01_02_425006 555 38 International International NNP 10_1101-2021_01_02_425006 555 39 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 555 40 under under IN 10_1101-2021_01_02_425006 555 41 a a DT 10_1101-2021_01_02_425006 555 42 ( ( -LRB- 10_1101-2021_01_02_425006 555 43 which which WDT 10_1101-2021_01_02_425006 555 44 was be VBD 10_1101-2021_01_02_425006 555 45 not not RB 10_1101-2021_01_02_425006 555 46 certified certify VBN 10_1101-2021_01_02_425006 555 47 by by IN 10_1101-2021_01_02_425006 555 48 peer peer NN 10_1101-2021_01_02_425006 555 49 review review NN 10_1101-2021_01_02_425006 555 50 ) ) -RRB- 10_1101-2021_01_02_425006 555 51 is be VBZ 10_1101-2021_01_02_425006 555 52 the the DT 10_1101-2021_01_02_425006 555 53 author author NN 10_1101-2021_01_02_425006 555 54 / / SYM 10_1101-2021_01_02_425006 555 55 funder funder NN 10_1101-2021_01_02_425006 555 56 , , , 10_1101-2021_01_02_425006 555 57 who who WP 10_1101-2021_01_02_425006 555 58 has have VBZ 10_1101-2021_01_02_425006 555 59 granted grant VBN 10_1101-2021_01_02_425006 555 60 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 555 61 a a DT 10_1101-2021_01_02_425006 555 62 license license NN 10_1101-2021_01_02_425006 555 63 to to TO 10_1101-2021_01_02_425006 555 64 display display VB 10_1101-2021_01_02_425006 555 65 the the DT 10_1101-2021_01_02_425006 555 66 preprint preprint NN 10_1101-2021_01_02_425006 555 67 in in IN 10_1101-2021_01_02_425006 555 68 perpetuity perpetuity NN 10_1101-2021_01_02_425006 555 69 . . . 10_1101-2021_01_02_425006 556 1 It -PRON- PRP 10_1101-2021_01_02_425006 556 2 is be VBZ 10_1101-2021_01_02_425006 556 3 made make VBN 10_1101-2021_01_02_425006 556 4 The the DT 10_1101-2021_01_02_425006 556 5 copyright copyright NN 10_1101-2021_01_02_425006 556 6 holder holder NN 10_1101-2021_01_02_425006 556 7 for for IN 10_1101-2021_01_02_425006 556 8 this this DT 10_1101-2021_01_02_425006 556 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 556 10 version version NN 10_1101-2021_01_02_425006 556 11 posted post VBD 10_1101-2021_01_02_425006 556 12 January January NNP 10_1101-2021_01_02_425006 556 13 6 6 CD 10_1101-2021_01_02_425006 556 14 , , , 10_1101-2021_01_02_425006 556 15 2021 2021 CD 10_1101-2021_01_02_425006 556 16 . . . 10_1101-2021_01_02_425006 556 17 ; ; : 10_1101-2021_01_02_425006 556 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 556 19 : : : 10_1101-2021_01_02_425006 556 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 556 21 preprint preprint NN 10_1101-2021_01_02_425006 556 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 556 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 556 24 39 39 CD 10_1101-2021_01_02_425006 556 25 Cappable cappable JJ 10_1101-2021_01_02_425006 556 26 - - HYPH 10_1101-2021_01_02_425006 556 27 seq seq NN 10_1101-2021_01_02_425006 556 28 . . . 10_1101-2021_01_02_425006 557 1 ( ( -LRB- 10_1101-2021_01_02_425006 557 2 B b NN 10_1101-2021_01_02_425006 557 3 ) ) -RRB- 10_1101-2021_01_02_425006 557 4 Average average JJ 10_1101-2021_01_02_425006 557 5 precision precision NN 10_1101-2021_01_02_425006 557 6 based base VBN 10_1101-2021_01_02_425006 557 7 on on IN 10_1101-2021_01_02_425006 557 8 perfect perfect JJ 10_1101-2021_01_02_425006 557 9 matching matching NN 10_1101-2021_01_02_425006 557 10 for for IN 10_1101-2021_01_02_425006 557 11 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 557 12 ( ( -LRB- 10_1101-2021_01_02_425006 557 13 left left RB 10_1101-2021_01_02_425006 557 14 ) ) -RRB- 10_1101-2021_01_02_425006 557 15 and and CC 10_1101-2021_01_02_425006 557 16 620 620 CD 10_1101-2021_01_02_425006 557 17 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 557 18 ( ( -LRB- 10_1101-2021_01_02_425006 557 19 right right UH 10_1101-2021_01_02_425006 557 20 ) ) -RRB- 10_1101-2021_01_02_425006 557 21 using use VBG 10_1101-2021_01_02_425006 557 22 evaluated evaluate VBN 10_1101-2021_01_02_425006 557 23 ATUs atu NNS 10_1101-2021_01_02_425006 557 24 from from IN 10_1101-2021_01_02_425006 557 25 SMRT SMRT NNP 10_1101-2021_01_02_425006 557 26 - - HYPH 10_1101-2021_01_02_425006 557 27 Cappable cappable JJ 10_1101-2021_01_02_425006 557 28 - - HYPH 10_1101-2021_01_02_425006 557 29 seq seq NN 10_1101-2021_01_02_425006 557 30 ( ( -LRB- 10_1101-2021_01_02_425006 557 31 black black JJ 10_1101-2021_01_02_425006 557 32 ) ) -RRB- 10_1101-2021_01_02_425006 557 33 and and CC 10_1101-2021_01_02_425006 557 34 evaluated evaluate VBN 10_1101-2021_01_02_425006 557 35 ATUs atu NNS 10_1101-2021_01_02_425006 557 36 from from IN 10_1101-2021_01_02_425006 557 37 621 621 CD 10_1101-2021_01_02_425006 557 38 SMRT smrt JJ 10_1101-2021_01_02_425006 557 39 - - HYPH 10_1101-2021_01_02_425006 557 40 Cappable cappable JJ 10_1101-2021_01_02_425006 557 41 - - HYPH 10_1101-2021_01_02_425006 557 42 seq seq NN 10_1101-2021_01_02_425006 557 43 and and CC 10_1101-2021_01_02_425006 557 44 SEnd SEnd NNP 10_1101-2021_01_02_425006 557 45 - - HYPH 10_1101-2021_01_02_425006 557 46 seq seq NNP 10_1101-2021_01_02_425006 557 47 ( ( -LRB- 10_1101-2021_01_02_425006 557 48 red red NNP 10_1101-2021_01_02_425006 557 49 ) ) -RRB- 10_1101-2021_01_02_425006 557 50 . . . 10_1101-2021_01_02_425006 558 1 The the DT 10_1101-2021_01_02_425006 558 2 magnitude magnitude NN 10_1101-2021_01_02_425006 558 3 of of IN 10_1101-2021_01_02_425006 558 4 the the DT 10_1101-2021_01_02_425006 558 5 point point NN 10_1101-2021_01_02_425006 558 6 denotes denote VBZ 10_1101-2021_01_02_425006 558 7 the the DT 10_1101-2021_01_02_425006 558 8 number number NN 10_1101-2021_01_02_425006 558 9 of of IN 10_1101-2021_01_02_425006 558 10 maximal maximal JJ 10_1101-2021_01_02_425006 558 11 622 622 CD 10_1101-2021_01_02_425006 558 12 ATU ATU NNP 10_1101-2021_01_02_425006 558 13 clusters cluster NNS 10_1101-2021_01_02_425006 558 14 with with IN 10_1101-2021_01_02_425006 558 15 same same JJ 10_1101-2021_01_02_425006 558 16 size size NN 10_1101-2021_01_02_425006 558 17 . . . 10_1101-2021_01_02_425006 559 1 ( ( -LRB- 10_1101-2021_01_02_425006 559 2 C c NN 10_1101-2021_01_02_425006 559 3 ) ) -RRB- 10_1101-2021_01_02_425006 559 4 Average average JJ 10_1101-2021_01_02_425006 559 5 number number NN 10_1101-2021_01_02_425006 559 6 of of IN 10_1101-2021_01_02_425006 559 7 ATUs atu NNS 10_1101-2021_01_02_425006 559 8 across across IN 10_1101-2021_01_02_425006 559 9 different different JJ 10_1101-2021_01_02_425006 559 10 sizes size NNS 10_1101-2021_01_02_425006 559 11 of of IN 10_1101-2021_01_02_425006 559 12 SMRT SMRT NNP 10_1101-2021_01_02_425006 559 13 maximal maximal JJ 10_1101-2021_01_02_425006 559 14 623 623 CD 10_1101-2021_01_02_425006 559 15 ATU ATU NNP 10_1101-2021_01_02_425006 559 16 clusters cluster NNS 10_1101-2021_01_02_425006 559 17 for for IN 10_1101-2021_01_02_425006 559 18 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 559 19 ( ( -LRB- 10_1101-2021_01_02_425006 559 20 left left RB 10_1101-2021_01_02_425006 559 21 ) ) -RRB- 10_1101-2021_01_02_425006 559 22 and and CC 10_1101-2021_01_02_425006 559 23 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 559 24 ( ( -LRB- 10_1101-2021_01_02_425006 559 25 right right RB 10_1101-2021_01_02_425006 559 26 ) ) -RRB- 10_1101-2021_01_02_425006 559 27 . . . 10_1101-2021_01_02_425006 560 1 624 624 CD 10_1101-2021_01_02_425006 560 2 .CC .CC : 10_1101-2021_01_02_425006 560 3 - - HYPH 10_1101-2021_01_02_425006 560 4 BY by IN 10_1101-2021_01_02_425006 560 5 - - HYPH 10_1101-2021_01_02_425006 560 6 NC NC NNP 10_1101-2021_01_02_425006 560 7 - - HYPH 10_1101-2021_01_02_425006 560 8 ND ND NNP 10_1101-2021_01_02_425006 560 9 4.0 4.0 CD 10_1101-2021_01_02_425006 560 10 International International NNP 10_1101-2021_01_02_425006 560 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 560 12 under under IN 10_1101-2021_01_02_425006 560 13 a a DT 10_1101-2021_01_02_425006 560 14 ( ( -LRB- 10_1101-2021_01_02_425006 560 15 which which WDT 10_1101-2021_01_02_425006 560 16 was be VBD 10_1101-2021_01_02_425006 560 17 not not RB 10_1101-2021_01_02_425006 560 18 certified certify VBN 10_1101-2021_01_02_425006 560 19 by by IN 10_1101-2021_01_02_425006 560 20 peer peer NN 10_1101-2021_01_02_425006 560 21 review review NN 10_1101-2021_01_02_425006 560 22 ) ) -RRB- 10_1101-2021_01_02_425006 560 23 is be VBZ 10_1101-2021_01_02_425006 560 24 the the DT 10_1101-2021_01_02_425006 560 25 author author NN 10_1101-2021_01_02_425006 560 26 / / SYM 10_1101-2021_01_02_425006 560 27 funder funder NN 10_1101-2021_01_02_425006 560 28 , , , 10_1101-2021_01_02_425006 560 29 who who WP 10_1101-2021_01_02_425006 560 30 has have VBZ 10_1101-2021_01_02_425006 560 31 granted grant VBN 10_1101-2021_01_02_425006 560 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 560 33 a a DT 10_1101-2021_01_02_425006 560 34 license license NN 10_1101-2021_01_02_425006 560 35 to to TO 10_1101-2021_01_02_425006 560 36 display display VB 10_1101-2021_01_02_425006 560 37 the the DT 10_1101-2021_01_02_425006 560 38 preprint preprint NN 10_1101-2021_01_02_425006 560 39 in in IN 10_1101-2021_01_02_425006 560 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 560 41 . . . 10_1101-2021_01_02_425006 561 1 It -PRON- PRP 10_1101-2021_01_02_425006 561 2 is be VBZ 10_1101-2021_01_02_425006 561 3 made make VBN 10_1101-2021_01_02_425006 561 4 The the DT 10_1101-2021_01_02_425006 561 5 copyright copyright NN 10_1101-2021_01_02_425006 561 6 holder holder NN 10_1101-2021_01_02_425006 561 7 for for IN 10_1101-2021_01_02_425006 561 8 this this DT 10_1101-2021_01_02_425006 561 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 561 10 version version NN 10_1101-2021_01_02_425006 561 11 posted post VBD 10_1101-2021_01_02_425006 561 12 January January NNP 10_1101-2021_01_02_425006 561 13 6 6 CD 10_1101-2021_01_02_425006 561 14 , , , 10_1101-2021_01_02_425006 561 15 2021 2021 CD 10_1101-2021_01_02_425006 561 16 . . . 10_1101-2021_01_02_425006 561 17 ; ; : 10_1101-2021_01_02_425006 561 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 561 19 : : : 10_1101-2021_01_02_425006 561 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 561 21 preprint preprint NN 10_1101-2021_01_02_425006 561 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 561 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 561 24 40 40 CD 10_1101-2021_01_02_425006 561 25 625 625 CD 10_1101-2021_01_02_425006 561 26 Fig fig NN 10_1101-2021_01_02_425006 561 27 . . . 10_1101-2021_01_02_425006 562 1 4 4 LS 10_1101-2021_01_02_425006 562 2 . . . 10_1101-2021_01_02_425006 563 1 Comparative comparative JJ 10_1101-2021_01_02_425006 563 2 analysis analysis NN 10_1101-2021_01_02_425006 563 3 of of IN 10_1101-2021_01_02_425006 563 4 the the DT 10_1101-2021_01_02_425006 563 5 performance performance NN 10_1101-2021_01_02_425006 563 6 between between IN 10_1101-2021_01_02_425006 563 7 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 563 8 and and CC 10_1101-2021_01_02_425006 563 9 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 563 10 without without IN 10_1101-2021_01_02_425006 563 11 the the DT 10_1101-2021_01_02_425006 563 12 bias bias NN 10_1101-2021_01_02_425006 563 13 626 626 CD 10_1101-2021_01_02_425006 563 14 rate rate NN 10_1101-2021_01_02_425006 563 15 constrains constrain NNS 10_1101-2021_01_02_425006 563 16 for for IN 10_1101-2021_01_02_425006 563 17 SMRT SMRT NNP 10_1101-2021_01_02_425006 563 18 maximal maximal JJ 10_1101-2021_01_02_425006 563 19 ATU ATU NNP 10_1101-2021_01_02_425006 563 20 clusters cluster NNS 10_1101-2021_01_02_425006 563 21 . . . 10_1101-2021_01_02_425006 564 1 ( ( -LRB- 10_1101-2021_01_02_425006 564 2 A a DT 10_1101-2021_01_02_425006 564 3 ) ) -RRB- 10_1101-2021_01_02_425006 564 4 Precision precision NN 10_1101-2021_01_02_425006 564 5 , , , 10_1101-2021_01_02_425006 564 6 recall recall NN 10_1101-2021_01_02_425006 564 7 and and CC 10_1101-2021_01_02_425006 564 8 F f NN 10_1101-2021_01_02_425006 564 9 - - HYPH 10_1101-2021_01_02_425006 564 10 score score NN 10_1101-2021_01_02_425006 564 11 based base VBN 10_1101-2021_01_02_425006 564 12 on on IN 10_1101-2021_01_02_425006 564 13 perfect perfect JJ 10_1101-2021_01_02_425006 564 14 627 627 CD 10_1101-2021_01_02_425006 564 15 matching matching NN 10_1101-2021_01_02_425006 564 16 for for IN 10_1101-2021_01_02_425006 564 17 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 564 18 and and CC 10_1101-2021_01_02_425006 564 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 564 20 . . . 10_1101-2021_01_02_425006 565 1 ( ( -LRB- 10_1101-2021_01_02_425006 565 2 B B NNP 10_1101-2021_01_02_425006 565 3 ) ) -RRB- 10_1101-2021_01_02_425006 565 4 Precision Precision NNP 10_1101-2021_01_02_425006 565 5 , , , 10_1101-2021_01_02_425006 565 6 recall recall NN 10_1101-2021_01_02_425006 565 7 and and CC 10_1101-2021_01_02_425006 565 8 F f NN 10_1101-2021_01_02_425006 565 9 - - HYPH 10_1101-2021_01_02_425006 565 10 score score NN 10_1101-2021_01_02_425006 565 11 based base VBN 10_1101-2021_01_02_425006 565 12 on on IN 10_1101-2021_01_02_425006 565 13 relaxed relaxed JJ 10_1101-2021_01_02_425006 565 14 628 628 CD 10_1101-2021_01_02_425006 565 15 matching match VBG 10_1101-2021_01_02_425006 565 16 for for IN 10_1101-2021_01_02_425006 565 17 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 565 18 and and CC 10_1101-2021_01_02_425006 565 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 565 20 . . . 10_1101-2021_01_02_425006 566 1 629 629 CD 10_1101-2021_01_02_425006 566 2 .CC .CC : 10_1101-2021_01_02_425006 566 3 - - HYPH 10_1101-2021_01_02_425006 566 4 BY by IN 10_1101-2021_01_02_425006 566 5 - - HYPH 10_1101-2021_01_02_425006 566 6 NC NC NNP 10_1101-2021_01_02_425006 566 7 - - HYPH 10_1101-2021_01_02_425006 566 8 ND ND NNP 10_1101-2021_01_02_425006 566 9 4.0 4.0 CD 10_1101-2021_01_02_425006 566 10 International International NNP 10_1101-2021_01_02_425006 566 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 566 12 under under IN 10_1101-2021_01_02_425006 566 13 a a DT 10_1101-2021_01_02_425006 566 14 ( ( -LRB- 10_1101-2021_01_02_425006 566 15 which which WDT 10_1101-2021_01_02_425006 566 16 was be VBD 10_1101-2021_01_02_425006 566 17 not not RB 10_1101-2021_01_02_425006 566 18 certified certify VBN 10_1101-2021_01_02_425006 566 19 by by IN 10_1101-2021_01_02_425006 566 20 peer peer NN 10_1101-2021_01_02_425006 566 21 review review NN 10_1101-2021_01_02_425006 566 22 ) ) -RRB- 10_1101-2021_01_02_425006 566 23 is be VBZ 10_1101-2021_01_02_425006 566 24 the the DT 10_1101-2021_01_02_425006 566 25 author author NN 10_1101-2021_01_02_425006 566 26 / / SYM 10_1101-2021_01_02_425006 566 27 funder funder NN 10_1101-2021_01_02_425006 566 28 , , , 10_1101-2021_01_02_425006 566 29 who who WP 10_1101-2021_01_02_425006 566 30 has have VBZ 10_1101-2021_01_02_425006 566 31 granted grant VBN 10_1101-2021_01_02_425006 566 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 566 33 a a DT 10_1101-2021_01_02_425006 566 34 license license NN 10_1101-2021_01_02_425006 566 35 to to TO 10_1101-2021_01_02_425006 566 36 display display VB 10_1101-2021_01_02_425006 566 37 the the DT 10_1101-2021_01_02_425006 566 38 preprint preprint NN 10_1101-2021_01_02_425006 566 39 in in IN 10_1101-2021_01_02_425006 566 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 566 41 . . . 10_1101-2021_01_02_425006 567 1 It -PRON- PRP 10_1101-2021_01_02_425006 567 2 is be VBZ 10_1101-2021_01_02_425006 567 3 made make VBN 10_1101-2021_01_02_425006 567 4 The the DT 10_1101-2021_01_02_425006 567 5 copyright copyright NN 10_1101-2021_01_02_425006 567 6 holder holder NN 10_1101-2021_01_02_425006 567 7 for for IN 10_1101-2021_01_02_425006 567 8 this this DT 10_1101-2021_01_02_425006 567 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 567 10 version version NN 10_1101-2021_01_02_425006 567 11 posted post VBD 10_1101-2021_01_02_425006 567 12 January January NNP 10_1101-2021_01_02_425006 567 13 6 6 CD 10_1101-2021_01_02_425006 567 14 , , , 10_1101-2021_01_02_425006 567 15 2021 2021 CD 10_1101-2021_01_02_425006 567 16 . . . 10_1101-2021_01_02_425006 567 17 ; ; : 10_1101-2021_01_02_425006 567 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 567 19 : : : 10_1101-2021_01_02_425006 567 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 567 21 preprint preprint NN 10_1101-2021_01_02_425006 567 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 567 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 567 24 41 41 CD 10_1101-2021_01_02_425006 567 25 630 630 CD 10_1101-2021_01_02_425006 567 26 Fig Fig NNP 10_1101-2021_01_02_425006 567 27 . . . 10_1101-2021_01_02_425006 568 1 5 5 CD 10_1101-2021_01_02_425006 568 2 . . . 10_1101-2021_01_02_425006 569 1 Comprehensive comprehensive JJ 10_1101-2021_01_02_425006 569 2 analysis analysis NN 10_1101-2021_01_02_425006 569 3 of of IN 10_1101-2021_01_02_425006 569 4 the the DT 10_1101-2021_01_02_425006 569 5 predicted predict VBN 10_1101-2021_01_02_425006 569 6 ATUs ATUs NNPS 10_1101-2021_01_02_425006 569 7 by by IN 10_1101-2021_01_02_425006 569 8 SeqATU SeqATU NNP 10_1101-2021_01_02_425006 569 9 . . . 10_1101-2021_01_02_425006 570 1 ( ( -LRB- 10_1101-2021_01_02_425006 570 2 A a DT 10_1101-2021_01_02_425006 570 3 ) ) -RRB- 10_1101-2021_01_02_425006 570 4 Number number NN 10_1101-2021_01_02_425006 570 5 of of IN 10_1101-2021_01_02_425006 570 6 ATUs ATUs NNPS 10_1101-2021_01_02_425006 570 7 across across IN 10_1101-2021_01_02_425006 570 8 631 631 CD 10_1101-2021_01_02_425006 570 9 different different JJ 10_1101-2021_01_02_425006 570 10 sizes size NNS 10_1101-2021_01_02_425006 570 11 . . . 10_1101-2021_01_02_425006 571 1 The the DT 10_1101-2021_01_02_425006 571 2 size size NN 10_1101-2021_01_02_425006 571 3 of of IN 10_1101-2021_01_02_425006 571 4 an an DT 10_1101-2021_01_02_425006 571 5 ATU ATU NNP 10_1101-2021_01_02_425006 571 6 is be VBZ 10_1101-2021_01_02_425006 571 7 the the DT 10_1101-2021_01_02_425006 571 8 number number NN 10_1101-2021_01_02_425006 571 9 of of IN 10_1101-2021_01_02_425006 571 10 its -PRON- PRP$ 10_1101-2021_01_02_425006 571 11 component component NN 10_1101-2021_01_02_425006 571 12 genes gene NNS 10_1101-2021_01_02_425006 571 13 . . . 10_1101-2021_01_02_425006 572 1 ( ( -LRB- 10_1101-2021_01_02_425006 572 2 B b NN 10_1101-2021_01_02_425006 572 3 ) ) -RRB- 10_1101-2021_01_02_425006 572 4 Distribution distribution NN 10_1101-2021_01_02_425006 572 5 of of IN 10_1101-2021_01_02_425006 572 6 the the DT 10_1101-2021_01_02_425006 572 7 number number NN 10_1101-2021_01_02_425006 572 8 632 632 CD 10_1101-2021_01_02_425006 572 9 of of IN 10_1101-2021_01_02_425006 572 10 ATUs atu NNS 10_1101-2021_01_02_425006 572 11 per per IN 10_1101-2021_01_02_425006 572 12 gene gene NN 10_1101-2021_01_02_425006 572 13 . . . 10_1101-2021_01_02_425006 573 1 633 633 CD 10_1101-2021_01_02_425006 573 2 .CC .CC : 10_1101-2021_01_02_425006 573 3 - - HYPH 10_1101-2021_01_02_425006 573 4 BY by IN 10_1101-2021_01_02_425006 573 5 - - HYPH 10_1101-2021_01_02_425006 573 6 NC NC NNP 10_1101-2021_01_02_425006 573 7 - - HYPH 10_1101-2021_01_02_425006 573 8 ND ND NNP 10_1101-2021_01_02_425006 573 9 4.0 4.0 CD 10_1101-2021_01_02_425006 573 10 International International NNP 10_1101-2021_01_02_425006 573 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 573 12 under under IN 10_1101-2021_01_02_425006 573 13 a a DT 10_1101-2021_01_02_425006 573 14 ( ( -LRB- 10_1101-2021_01_02_425006 573 15 which which WDT 10_1101-2021_01_02_425006 573 16 was be VBD 10_1101-2021_01_02_425006 573 17 not not RB 10_1101-2021_01_02_425006 573 18 certified certify VBN 10_1101-2021_01_02_425006 573 19 by by IN 10_1101-2021_01_02_425006 573 20 peer peer NN 10_1101-2021_01_02_425006 573 21 review review NN 10_1101-2021_01_02_425006 573 22 ) ) -RRB- 10_1101-2021_01_02_425006 573 23 is be VBZ 10_1101-2021_01_02_425006 573 24 the the DT 10_1101-2021_01_02_425006 573 25 author author NN 10_1101-2021_01_02_425006 573 26 / / SYM 10_1101-2021_01_02_425006 573 27 funder funder NN 10_1101-2021_01_02_425006 573 28 , , , 10_1101-2021_01_02_425006 573 29 who who WP 10_1101-2021_01_02_425006 573 30 has have VBZ 10_1101-2021_01_02_425006 573 31 granted grant VBN 10_1101-2021_01_02_425006 573 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 573 33 a a DT 10_1101-2021_01_02_425006 573 34 license license NN 10_1101-2021_01_02_425006 573 35 to to TO 10_1101-2021_01_02_425006 573 36 display display VB 10_1101-2021_01_02_425006 573 37 the the DT 10_1101-2021_01_02_425006 573 38 preprint preprint NN 10_1101-2021_01_02_425006 573 39 in in IN 10_1101-2021_01_02_425006 573 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 573 41 . . . 10_1101-2021_01_02_425006 574 1 It -PRON- PRP 10_1101-2021_01_02_425006 574 2 is be VBZ 10_1101-2021_01_02_425006 574 3 made make VBN 10_1101-2021_01_02_425006 574 4 The the DT 10_1101-2021_01_02_425006 574 5 copyright copyright NN 10_1101-2021_01_02_425006 574 6 holder holder NN 10_1101-2021_01_02_425006 574 7 for for IN 10_1101-2021_01_02_425006 574 8 this this DT 10_1101-2021_01_02_425006 574 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 574 10 version version NN 10_1101-2021_01_02_425006 574 11 posted post VBD 10_1101-2021_01_02_425006 574 12 January January NNP 10_1101-2021_01_02_425006 574 13 6 6 CD 10_1101-2021_01_02_425006 574 14 , , , 10_1101-2021_01_02_425006 574 15 2021 2021 CD 10_1101-2021_01_02_425006 574 16 . . . 10_1101-2021_01_02_425006 574 17 ; ; : 10_1101-2021_01_02_425006 574 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 574 19 : : : 10_1101-2021_01_02_425006 574 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 574 21 preprint preprint NN 10_1101-2021_01_02_425006 574 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 574 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 574 24 42 42 CD 10_1101-2021_01_02_425006 574 25 634 634 CD 10_1101-2021_01_02_425006 574 26 Fig fig NN 10_1101-2021_01_02_425006 574 27 . . . 10_1101-2021_01_02_425006 575 1 6 6 CD 10_1101-2021_01_02_425006 575 2 . . . 10_1101-2021_01_02_425006 576 1 Integrative Integrative NNP 10_1101-2021_01_02_425006 576 2 Genomics Genomics NNP 10_1101-2021_01_02_425006 576 3 Viewer Viewer NNP 10_1101-2021_01_02_425006 576 4 ( ( -LRB- 10_1101-2021_01_02_425006 576 5 IGV IGV NNP 10_1101-2021_01_02_425006 576 6 ) ) -RRB- 10_1101-2021_01_02_425006 576 7 representation representation NN 10_1101-2021_01_02_425006 576 8 of of IN 10_1101-2021_01_02_425006 576 9 the the DT 10_1101-2021_01_02_425006 576 10 mapping mapping NN 10_1101-2021_01_02_425006 576 11 and and CC 10_1101-2021_01_02_425006 576 12 ATUs atu NNS 10_1101-2021_01_02_425006 576 13 . . . 10_1101-2021_01_02_425006 577 1 Mapping mapping NN 10_1101-2021_01_02_425006 577 2 and and CC 10_1101-2021_01_02_425006 577 3 635 635 CD 10_1101-2021_01_02_425006 577 4 ATUs atu NNS 10_1101-2021_01_02_425006 577 5 of of IN 10_1101-2021_01_02_425006 577 6 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 577 7 ( ( -LRB- 10_1101-2021_01_02_425006 577 8 orange orange NNP 10_1101-2021_01_02_425006 577 9 ) ) -RRB- 10_1101-2021_01_02_425006 577 10 and and CC 10_1101-2021_01_02_425006 577 11 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 577 12 ( ( -LRB- 10_1101-2021_01_02_425006 577 13 blue blue NNP 10_1101-2021_01_02_425006 577 14 ) ) -RRB- 10_1101-2021_01_02_425006 577 15 were be VBD 10_1101-2021_01_02_425006 577 16 shown show VBN 10_1101-2021_01_02_425006 577 17 for for IN 10_1101-2021_01_02_425006 577 18 the the DT 10_1101-2021_01_02_425006 577 19 maximal maximal JJ 10_1101-2021_01_02_425006 577 20 ATU ATU NNP 10_1101-2021_01_02_425006 577 21 cluster cluster NN 10_1101-2021_01_02_425006 577 22 636 636 CD 10_1101-2021_01_02_425006 577 23 containing contain VBG 10_1101-2021_01_02_425006 577 24 the the DT 10_1101-2021_01_02_425006 577 25 bioB bioB NNS 10_1101-2021_01_02_425006 577 26 , , , 10_1101-2021_01_02_425006 577 27 bioF biof NN 10_1101-2021_01_02_425006 577 28 , , , 10_1101-2021_01_02_425006 577 29 bioC bioC NNS 10_1101-2021_01_02_425006 577 30 , , , 10_1101-2021_01_02_425006 577 31 bioD bioD NNS 10_1101-2021_01_02_425006 577 32 and and CC 10_1101-2021_01_02_425006 577 33 uvrB uvrb JJ 10_1101-2021_01_02_425006 577 34 genes gene NNS 10_1101-2021_01_02_425006 577 35 . . . 10_1101-2021_01_02_425006 578 1 637 637 CD 10_1101-2021_01_02_425006 578 2 .CC .CC : 10_1101-2021_01_02_425006 578 3 - - HYPH 10_1101-2021_01_02_425006 578 4 BY by IN 10_1101-2021_01_02_425006 578 5 - - HYPH 10_1101-2021_01_02_425006 578 6 NC NC NNP 10_1101-2021_01_02_425006 578 7 - - HYPH 10_1101-2021_01_02_425006 578 8 ND ND NNP 10_1101-2021_01_02_425006 578 9 4.0 4.0 CD 10_1101-2021_01_02_425006 578 10 International International NNP 10_1101-2021_01_02_425006 578 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 578 12 under under IN 10_1101-2021_01_02_425006 578 13 a a DT 10_1101-2021_01_02_425006 578 14 ( ( -LRB- 10_1101-2021_01_02_425006 578 15 which which WDT 10_1101-2021_01_02_425006 578 16 was be VBD 10_1101-2021_01_02_425006 578 17 not not RB 10_1101-2021_01_02_425006 578 18 certified certify VBN 10_1101-2021_01_02_425006 578 19 by by IN 10_1101-2021_01_02_425006 578 20 peer peer NN 10_1101-2021_01_02_425006 578 21 review review NN 10_1101-2021_01_02_425006 578 22 ) ) -RRB- 10_1101-2021_01_02_425006 578 23 is be VBZ 10_1101-2021_01_02_425006 578 24 the the DT 10_1101-2021_01_02_425006 578 25 author author NN 10_1101-2021_01_02_425006 578 26 / / SYM 10_1101-2021_01_02_425006 578 27 funder funder NN 10_1101-2021_01_02_425006 578 28 , , , 10_1101-2021_01_02_425006 578 29 who who WP 10_1101-2021_01_02_425006 578 30 has have VBZ 10_1101-2021_01_02_425006 578 31 granted grant VBN 10_1101-2021_01_02_425006 578 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 578 33 a a DT 10_1101-2021_01_02_425006 578 34 license license NN 10_1101-2021_01_02_425006 578 35 to to TO 10_1101-2021_01_02_425006 578 36 display display VB 10_1101-2021_01_02_425006 578 37 the the DT 10_1101-2021_01_02_425006 578 38 preprint preprint NN 10_1101-2021_01_02_425006 578 39 in in IN 10_1101-2021_01_02_425006 578 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 578 41 . . . 10_1101-2021_01_02_425006 579 1 It -PRON- PRP 10_1101-2021_01_02_425006 579 2 is be VBZ 10_1101-2021_01_02_425006 579 3 made make VBN 10_1101-2021_01_02_425006 579 4 The the DT 10_1101-2021_01_02_425006 579 5 copyright copyright NN 10_1101-2021_01_02_425006 579 6 holder holder NN 10_1101-2021_01_02_425006 579 7 for for IN 10_1101-2021_01_02_425006 579 8 this this DT 10_1101-2021_01_02_425006 579 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 579 10 version version NN 10_1101-2021_01_02_425006 579 11 posted post VBD 10_1101-2021_01_02_425006 579 12 January January NNP 10_1101-2021_01_02_425006 579 13 6 6 CD 10_1101-2021_01_02_425006 579 14 , , , 10_1101-2021_01_02_425006 579 15 2021 2021 CD 10_1101-2021_01_02_425006 579 16 . . . 10_1101-2021_01_02_425006 579 17 ; ; : 10_1101-2021_01_02_425006 579 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 579 19 : : : 10_1101-2021_01_02_425006 579 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 579 21 preprint preprint NN 10_1101-2021_01_02_425006 579 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 579 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ CD 10_1101-2021_01_02_425006 579 24 43 43 CD 10_1101-2021_01_02_425006 579 25 638 638 CD 10_1101-2021_01_02_425006 579 26 Fig fig NN 10_1101-2021_01_02_425006 579 27 . . . 10_1101-2021_01_02_425006 580 1 7 7 LS 10_1101-2021_01_02_425006 580 2 . . . 10_1101-2021_01_02_425006 581 1 Interpretation interpretation NN 10_1101-2021_01_02_425006 581 2 and and CC 10_1101-2021_01_02_425006 581 3 results result NNS 10_1101-2021_01_02_425006 581 4 of of IN 10_1101-2021_01_02_425006 581 5 the the DT 10_1101-2021_01_02_425006 581 6 functional functional JJ 10_1101-2021_01_02_425006 581 7 relatedness relatedness NN 10_1101-2021_01_02_425006 581 8 of of IN 10_1101-2021_01_02_425006 581 9 different different JJ 10_1101-2021_01_02_425006 581 10 gene gene NN 10_1101-2021_01_02_425006 581 11 pairs pair NNS 10_1101-2021_01_02_425006 581 12 based base VBN 10_1101-2021_01_02_425006 581 13 on on IN 10_1101-2021_01_02_425006 581 14 GO GO NNP 10_1101-2021_01_02_425006 581 15 639 639 CD 10_1101-2021_01_02_425006 581 16 and and CC 10_1101-2021_01_02_425006 581 17 KEGG KEGG NNP 10_1101-2021_01_02_425006 581 18 enrichment enrichment NN 10_1101-2021_01_02_425006 581 19 analyses analysis NNS 10_1101-2021_01_02_425006 581 20 . . . 10_1101-2021_01_02_425006 582 1 ( ( -LRB- 10_1101-2021_01_02_425006 582 2 A a NN 10_1101-2021_01_02_425006 582 3 ) ) -RRB- 10_1101-2021_01_02_425006 582 4 Illustration illustration NN 10_1101-2021_01_02_425006 582 5 of of IN 10_1101-2021_01_02_425006 582 6 two two CD 10_1101-2021_01_02_425006 582 7 different different JJ 10_1101-2021_01_02_425006 582 8 gene gene NN 10_1101-2021_01_02_425006 582 9 pairs pair NNS 10_1101-2021_01_02_425006 582 10 i i PRP 10_1101-2021_01_02_425006 582 11 and and CC 10_1101-2021_01_02_425006 582 12 ii ii NNP 10_1101-2021_01_02_425006 582 13 . . . 10_1101-2021_01_02_425006 583 1 ( ( -LRB- 10_1101-2021_01_02_425006 583 2 B B NNP 10_1101-2021_01_02_425006 583 3 ) ) -RRB- 10_1101-2021_01_02_425006 583 4 Functional Functional NNP 10_1101-2021_01_02_425006 583 5 640 640 CD 10_1101-2021_01_02_425006 583 6 relatedness relatedness NN 10_1101-2021_01_02_425006 583 7 results result NNS 10_1101-2021_01_02_425006 583 8 based base VBN 10_1101-2021_01_02_425006 583 9 on on IN 10_1101-2021_01_02_425006 583 10 GO GO NNP 10_1101-2021_01_02_425006 583 11 enrichment enrichment NN 10_1101-2021_01_02_425006 583 12 analysis analysis NN 10_1101-2021_01_02_425006 583 13 for for IN 10_1101-2021_01_02_425006 583 14 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 583 15 ( ( -LRB- 10_1101-2021_01_02_425006 583 16 left left RB 10_1101-2021_01_02_425006 583 17 ) ) -RRB- 10_1101-2021_01_02_425006 583 18 and and CC 10_1101-2021_01_02_425006 583 19 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 583 20 ( ( -LRB- 10_1101-2021_01_02_425006 583 21 right right RB 10_1101-2021_01_02_425006 583 22 ) ) -RRB- 10_1101-2021_01_02_425006 583 23 . . . 10_1101-2021_01_02_425006 584 1 641 641 CD 10_1101-2021_01_02_425006 584 2 ( ( -LRB- 10_1101-2021_01_02_425006 584 3 C C NNP 10_1101-2021_01_02_425006 584 4 ) ) -RRB- 10_1101-2021_01_02_425006 584 5 The the DT 10_1101-2021_01_02_425006 584 6 proportion proportion NN 10_1101-2021_01_02_425006 584 7 of of IN 10_1101-2021_01_02_425006 584 8 two two CD 10_1101-2021_01_02_425006 584 9 different different JJ 10_1101-2021_01_02_425006 584 10 gene gene NN 10_1101-2021_01_02_425006 584 11 pairs pair NNS 10_1101-2021_01_02_425006 584 12 whose whose WP$ 10_1101-2021_01_02_425006 584 13 genes gene NNS 10_1101-2021_01_02_425006 584 14 are be VBP 10_1101-2021_01_02_425006 584 15 contained contain VBN 10_1101-2021_01_02_425006 584 16 in in IN 10_1101-2021_01_02_425006 584 17 the the DT 10_1101-2021_01_02_425006 584 18 same same JJ 10_1101-2021_01_02_425006 584 19 KEGG KEGG NNP 10_1101-2021_01_02_425006 584 20 pathway pathway NN 10_1101-2021_01_02_425006 584 21 642 642 CD 10_1101-2021_01_02_425006 584 22 for for IN 10_1101-2021_01_02_425006 584 23 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 584 24 ( ( -LRB- 10_1101-2021_01_02_425006 584 25 left left RB 10_1101-2021_01_02_425006 584 26 ) ) -RRB- 10_1101-2021_01_02_425006 584 27 and and CC 10_1101-2021_01_02_425006 584 28 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 584 29 ( ( -LRB- 10_1101-2021_01_02_425006 584 30 right right RB 10_1101-2021_01_02_425006 584 31 ) ) -RRB- 10_1101-2021_01_02_425006 584 32 . . . 10_1101-2021_01_02_425006 585 1 ( ( -LRB- 10_1101-2021_01_02_425006 585 2 D d NN 10_1101-2021_01_02_425006 585 3 ) ) -RRB- 10_1101-2021_01_02_425006 585 4 The the DT 10_1101-2021_01_02_425006 585 5 functional functional JJ 10_1101-2021_01_02_425006 585 6 relatedness relatedness NN 10_1101-2021_01_02_425006 585 7 results result NNS 10_1101-2021_01_02_425006 585 8 based base VBN 10_1101-2021_01_02_425006 585 9 on on IN 10_1101-2021_01_02_425006 585 10 643 643 CD 10_1101-2021_01_02_425006 585 11 KEGG KEGG NNP 10_1101-2021_01_02_425006 585 12 enrichment enrichment NN 10_1101-2021_01_02_425006 585 13 analysis analysis NN 10_1101-2021_01_02_425006 585 14 for for IN 10_1101-2021_01_02_425006 585 15 M9Enrich_Seq M9Enrich_Seq NNP 10_1101-2021_01_02_425006 585 16 ( ( -LRB- 10_1101-2021_01_02_425006 585 17 left left RB 10_1101-2021_01_02_425006 585 18 ) ) -RRB- 10_1101-2021_01_02_425006 585 19 and and CC 10_1101-2021_01_02_425006 585 20 RiEnrich_Seq RiEnrich_Seq NNP 10_1101-2021_01_02_425006 585 21 ( ( -LRB- 10_1101-2021_01_02_425006 585 22 right right RB 10_1101-2021_01_02_425006 585 23 ) ) -RRB- 10_1101-2021_01_02_425006 585 24 . . . 10_1101-2021_01_02_425006 586 1 644 644 CD 10_1101-2021_01_02_425006 586 2 .CC .CC NFP 10_1101-2021_01_02_425006 586 3 - - HYPH 10_1101-2021_01_02_425006 586 4 BY by IN 10_1101-2021_01_02_425006 586 5 - - HYPH 10_1101-2021_01_02_425006 586 6 NC NC NNP 10_1101-2021_01_02_425006 586 7 - - HYPH 10_1101-2021_01_02_425006 586 8 ND ND NNP 10_1101-2021_01_02_425006 586 9 4.0 4.0 CD 10_1101-2021_01_02_425006 586 10 International International NNP 10_1101-2021_01_02_425006 586 11 licenseavailable licenseavailable NN 10_1101-2021_01_02_425006 586 12 under under IN 10_1101-2021_01_02_425006 586 13 a a DT 10_1101-2021_01_02_425006 586 14 ( ( -LRB- 10_1101-2021_01_02_425006 586 15 which which WDT 10_1101-2021_01_02_425006 586 16 was be VBD 10_1101-2021_01_02_425006 586 17 not not RB 10_1101-2021_01_02_425006 586 18 certified certify VBN 10_1101-2021_01_02_425006 586 19 by by IN 10_1101-2021_01_02_425006 586 20 peer peer NN 10_1101-2021_01_02_425006 586 21 review review NN 10_1101-2021_01_02_425006 586 22 ) ) -RRB- 10_1101-2021_01_02_425006 586 23 is be VBZ 10_1101-2021_01_02_425006 586 24 the the DT 10_1101-2021_01_02_425006 586 25 author author NN 10_1101-2021_01_02_425006 586 26 / / SYM 10_1101-2021_01_02_425006 586 27 funder funder NN 10_1101-2021_01_02_425006 586 28 , , , 10_1101-2021_01_02_425006 586 29 who who WP 10_1101-2021_01_02_425006 586 30 has have VBZ 10_1101-2021_01_02_425006 586 31 granted grant VBN 10_1101-2021_01_02_425006 586 32 bioRxiv biorxiv IN 10_1101-2021_01_02_425006 586 33 a a DT 10_1101-2021_01_02_425006 586 34 license license NN 10_1101-2021_01_02_425006 586 35 to to TO 10_1101-2021_01_02_425006 586 36 display display VB 10_1101-2021_01_02_425006 586 37 the the DT 10_1101-2021_01_02_425006 586 38 preprint preprint NN 10_1101-2021_01_02_425006 586 39 in in IN 10_1101-2021_01_02_425006 586 40 perpetuity perpetuity NN 10_1101-2021_01_02_425006 586 41 . . . 10_1101-2021_01_02_425006 587 1 It -PRON- PRP 10_1101-2021_01_02_425006 587 2 is be VBZ 10_1101-2021_01_02_425006 587 3 made make VBN 10_1101-2021_01_02_425006 587 4 The the DT 10_1101-2021_01_02_425006 587 5 copyright copyright NN 10_1101-2021_01_02_425006 587 6 holder holder NN 10_1101-2021_01_02_425006 587 7 for for IN 10_1101-2021_01_02_425006 587 8 this this DT 10_1101-2021_01_02_425006 587 9 preprintthis preprintthis NN 10_1101-2021_01_02_425006 587 10 version version NN 10_1101-2021_01_02_425006 587 11 posted post VBD 10_1101-2021_01_02_425006 587 12 January January NNP 10_1101-2021_01_02_425006 587 13 6 6 CD 10_1101-2021_01_02_425006 587 14 , , , 10_1101-2021_01_02_425006 587 15 2021 2021 CD 10_1101-2021_01_02_425006 587 16 . . . 10_1101-2021_01_02_425006 587 17 ; ; : 10_1101-2021_01_02_425006 587 18 https://doi.org/10.1101/2021.01.02.425006doi https://doi.org/10.1101/2021.01.02.425006doi XX 10_1101-2021_01_02_425006 587 19 : : : 10_1101-2021_01_02_425006 587 20 bioRxiv biorxiv VB 10_1101-2021_01_02_425006 587 21 preprint preprint NN 10_1101-2021_01_02_425006 587 22 https://doi.org/10.1101/2021.01.02.425006 https://doi.org/10.1101/2021.01.02.425006 CD 10_1101-2021_01_02_425006 587 23 http://creativecommons.org/licenses/by-nc-nd/4.0/ http://creativecommons.org/licenses/by-nc-nd/4.0/ NN