id sid tid token lemma pos zp38w953f1h 1 1 genomics genomic NOUN zp38w953f1h 1 2 , , PUNCT zp38w953f1h 1 3 or or CCONJ zp38w953f1h 1 4 the the DET zp38w953f1h 1 5 study study NOUN zp38w953f1h 1 6 of of ADP zp38w953f1h 1 7 genome genome NOUN zp38w953f1h 1 8 - - PUNCT zp38w953f1h 1 9 derived derive VERB zp38w953f1h 1 10 data datum NOUN zp38w953f1h 1 11 , , PUNCT zp38w953f1h 1 12 has have AUX zp38w953f1h 1 13 had have VERB zp38w953f1h 1 14 widespread widespread ADJ zp38w953f1h 1 15 impact impact NOUN zp38w953f1h 1 16 in in ADP zp38w953f1h 1 17 applications application NOUN zp38w953f1h 1 18 including include VERB zp38w953f1h 1 19 medicine medicine NOUN zp38w953f1h 1 20 , , PUNCT zp38w953f1h 1 21 forensic forensic ADJ zp38w953f1h 1 22 science science NOUN zp38w953f1h 1 23 , , PUNCT zp38w953f1h 1 24 human human ADJ zp38w953f1h 1 25 evolution evolution NOUN zp38w953f1h 1 26 , , PUNCT zp38w953f1h 1 27 environmental environmental ADJ zp38w953f1h 1 28 science science NOUN zp38w953f1h 1 29 , , PUNCT zp38w953f1h 1 30 and and CCONJ zp38w953f1h 1 31 social social ADJ zp38w953f1h 1 32 science science NOUN zp38w953f1h 1 33 . . PUNCT zp38w953f1h 2 1 the the DET zp38w953f1h 2 2 plummeting plummet VERB zp38w953f1h 2 3 cost cost NOUN zp38w953f1h 2 4 of of ADP zp38w953f1h 2 5 genome genome NOUN zp38w953f1h 2 6 sequencing sequence VERB zp38w953f1h 2 7 in in ADP zp38w953f1h 2 8 the the DET zp38w953f1h 2 9 last last ADJ zp38w953f1h 2 10 decade decade NOUN zp38w953f1h 2 11 has have AUX zp38w953f1h 2 12 spurred spur VERB zp38w953f1h 2 13 an an DET zp38w953f1h 2 14 exponential exponential ADJ zp38w953f1h 2 15 growth growth NOUN zp38w953f1h 2 16 of of ADP zp38w953f1h 2 17 genomic genomic ADJ zp38w953f1h 2 18 data datum NOUN zp38w953f1h 2 19 . . PUNCT zp38w953f1h 3 1 the the DET zp38w953f1h 3 2 rate rate NOUN zp38w953f1h 3 3 of of ADP zp38w953f1h 3 4 data data NOUN zp38w953f1h 3 5 generation generation NOUN zp38w953f1h 3 6 from from ADP zp38w953f1h 3 7 these these DET zp38w953f1h 3 8 sequencing sequence VERB zp38w953f1h 3 9 techniques technique NOUN zp38w953f1h 3 10 has have AUX zp38w953f1h 3 11 outpaced outpace VERB zp38w953f1h 3 12 computing compute VERB zp38w953f1h 3 13 throughput throughput NOUN zp38w953f1h 3 14 , , PUNCT zp38w953f1h 3 15 as as SCONJ zp38w953f1h 3 16 predicted predict VERB zp38w953f1h 3 17 by by ADP zp38w953f1h 3 18 moore moore NOUN zp38w953f1h 3 19 's 's PART zp38w953f1h 3 20 law law NOUN zp38w953f1h 3 21 , , PUNCT zp38w953f1h 3 22 causing cause VERB zp38w953f1h 3 23 a a DET zp38w953f1h 3 24 major major ADJ zp38w953f1h 3 25 bottleneck bottleneck NOUN zp38w953f1h 3 26 in in ADP zp38w953f1h 3 27 the the DET zp38w953f1h 3 28 rate rate NOUN zp38w953f1h 3 29 of of ADP zp38w953f1h 3 30 data datum NOUN zp38w953f1h 3 31 processing processing NOUN zp38w953f1h 3 32 and and CCONJ zp38w953f1h 3 33 analysis analysis NOUN zp38w953f1h 3 34 . . PUNCT zp38w953f1h 4 1 emerging emerge VERB zp38w953f1h 4 2 genome genome NOUN zp38w953f1h 4 3 data datum NOUN zp38w953f1h 4 4 is be AUX zp38w953f1h 4 5 also also ADV zp38w953f1h 4 6 characterized characterize VERB zp38w953f1h 4 7 by by ADP zp38w953f1h 4 8 missing miss VERB zp38w953f1h 4 9 and and CCONJ zp38w953f1h 4 10 erroneous erroneous ADJ zp38w953f1h 4 11 values value NOUN zp38w953f1h 4 12 , , PUNCT zp38w953f1h 4 13 that that PRON zp38w953f1h 4 14 reduce reduce VERB zp38w953f1h 4 15 data datum NOUN zp38w953f1h 4 16 fidelity fidelity NOUN zp38w953f1h 4 17 and and CCONJ zp38w953f1h 4 18 limit limit VERB zp38w953f1h 4 19 its its PRON zp38w953f1h 4 20 applicability applicability NOUN zp38w953f1h 4 21 for for ADP zp38w953f1h 4 22 downstream downstream ADJ zp38w953f1h 4 23 analysis analysis NOUN zp38w953f1h 4 24 . . PUNCT zp38w953f1h 5 1 this this PRON zp38w953f1h 5 2 forms form VERB zp38w953f1h 5 3 the the DET zp38w953f1h 5 4 basis basis NOUN zp38w953f1h 5 5 of of ADP zp38w953f1h 5 6 the the DET zp38w953f1h 5 7 following follow VERB zp38w953f1h 5 8 research research NOUN zp38w953f1h 5 9 questions question NOUN zp38w953f1h 5 10 : : PUNCT zp38w953f1h 5 11 ( ( PUNCT zp38w953f1h 5 12 i i NOUN zp38w953f1h 5 13 ) ) PUNCT zp38w953f1h 5 14 can can AUX zp38w953f1h 5 15 we we PRON zp38w953f1h 5 16 design design VERB zp38w953f1h 5 17 frameworks framework NOUN zp38w953f1h 5 18 that that PRON zp38w953f1h 5 19 can can AUX zp38w953f1h 5 20 expedite expedite VERB zp38w953f1h 5 21 data data NOUN zp38w953f1h 5 22 analysis analysis NOUN zp38w953f1h 5 23 and and CCONJ zp38w953f1h 5 24 enable enable VERB zp38w953f1h 5 25 efficient efficient ADJ zp38w953f1h 5 26 utilization utilization NOUN zp38w953f1h 5 27 of of ADP zp38w953f1h 5 28 computational computational ADJ zp38w953f1h 5 29 resources resource NOUN zp38w953f1h 5 30 ? ? PUNCT zp38w953f1h 6 1 ( ( PUNCT zp38w953f1h 6 2 ii ii PROPN zp38w953f1h 6 3 ) ) PUNCT zp38w953f1h 6 4 can can AUX zp38w953f1h 6 5 we we PRON zp38w953f1h 6 6 develop develop VERB zp38w953f1h 6 7 accurate accurate ADJ zp38w953f1h 6 8 and and CCONJ zp38w953f1h 6 9 efficient efficient ADJ zp38w953f1h 6 10 algorithms algorithm NOUN zp38w953f1h 6 11 to to PART zp38w953f1h 6 12 improve improve VERB zp38w953f1h 6 13 data data NOUN zp38w953f1h 6 14 fidelity fidelity NOUN zp38w953f1h 6 15 in in ADP zp38w953f1h 6 16 genomic genomic PROPN zp38w953f1h 6 17 applications?we applications?we PUNCT zp38w953f1h 6 18 address address VERB zp38w953f1h 6 19 the the DET zp38w953f1h 6 20 first first ADJ zp38w953f1h 6 21 problem problem NOUN zp38w953f1h 6 22 by by ADP zp38w953f1h 6 23 developing develop VERB zp38w953f1h 6 24 a a DET zp38w953f1h 6 25 parallel parallel ADJ zp38w953f1h 6 26 data datum NOUN zp38w953f1h 6 27 analysis analysis NOUN zp38w953f1h 6 28 framework framework NOUN zp38w953f1h 6 29 that that PRON zp38w953f1h 6 30 accelerates accelerate VERB zp38w953f1h 6 31 large large ADJ zp38w953f1h 6 32 - - PUNCT zp38w953f1h 6 33 scale scale NOUN zp38w953f1h 6 34 comparative comparative ADJ zp38w953f1h 6 35 genomics genomic NOUN zp38w953f1h 6 36 applications application NOUN zp38w953f1h 6 37 . . PUNCT zp38w953f1h 7 1 we we PRON zp38w953f1h 7 2 identify identify VERB zp38w953f1h 7 3 that that DET zp38w953f1h 7 4 optimal optimal ADJ zp38w953f1h 7 5 data datum NOUN zp38w953f1h 7 6 partitioning partition VERB zp38w953f1h 7 7 and and CCONJ zp38w953f1h 7 8 caching caching NOUN zp38w953f1h 7 9 significantly significantly ADV zp38w953f1h 7 10 improve improve VERB zp38w953f1h 7 11 the the DET zp38w953f1h 7 12 performance performance NOUN zp38w953f1h 7 13 of of ADP zp38w953f1h 7 14 such such ADJ zp38w953f1h 7 15 framework framework NOUN zp38w953f1h 7 16 . . PUNCT zp38w953f1h 8 1 we we PRON zp38w953f1h 8 2 further far ADV zp38w953f1h 8 3 construct construct VERB zp38w953f1h 8 4 a a DET zp38w953f1h 8 5 predictive predictive ADJ zp38w953f1h 8 6 model model NOUN zp38w953f1h 8 7 to to PART zp38w953f1h 8 8 estimate estimate VERB zp38w953f1h 8 9 runtime runtime NOUN zp38w953f1h 8 10 configurations configuration NOUN zp38w953f1h 8 11 that that PRON zp38w953f1h 8 12 facilitate facilitate VERB zp38w953f1h 8 13 optimal optimal ADJ zp38w953f1h 8 14 utilization utilization NOUN zp38w953f1h 8 15 of of ADP zp38w953f1h 8 16 cloud cloud NOUN zp38w953f1h 8 17 and and CCONJ zp38w953f1h 8 18 cluster cluster NOUN zp38w953f1h 8 19 - - PUNCT zp38w953f1h 8 20 based base VERB zp38w953f1h 8 21 resources resource NOUN zp38w953f1h 8 22 while while SCONJ zp38w953f1h 8 23 executing execute VERB zp38w953f1h 8 24 data data NOUN zp38w953f1h 8 25 - - PUNCT zp38w953f1h 8 26 intensive intensive ADJ zp38w953f1h 8 27 applications application NOUN zp38w953f1h 8 28 . . PUNCT zp38w953f1h 9 1 the the DET zp38w953f1h 9 2 fidelity fidelity NOUN zp38w953f1h 9 3 of of ADP zp38w953f1h 9 4 genomic genomic ADJ zp38w953f1h 9 5 data datum NOUN zp38w953f1h 9 6 derived derive VERB zp38w953f1h 9 7 from from ADP zp38w953f1h 9 8 next next ADJ zp38w953f1h 9 9 - - PUNCT zp38w953f1h 9 10 generation generation NOUN zp38w953f1h 9 11 sequencing sequence VERB zp38w953f1h 9 12 techniques technique NOUN zp38w953f1h 9 13 impacts impact VERB zp38w953f1h 9 14 downstream downstream ADJ zp38w953f1h 9 15 applications application NOUN zp38w953f1h 9 16 like like ADP zp38w953f1h 9 17 genome genome NOUN zp38w953f1h 9 18 - - PUNCT zp38w953f1h 9 19 wide wide ADJ zp38w953f1h 9 20 association association NOUN zp38w953f1h 9 21 study study NOUN zp38w953f1h 9 22 ( ( PUNCT zp38w953f1h 9 23 gwas gwas PROPN zp38w953f1h 9 24 ) ) PUNCT zp38w953f1h 9 25 and and CCONJ zp38w953f1h 9 26 genome genome NOUN zp38w953f1h 9 27 assembly assembly NOUN zp38w953f1h 9 28 . . PUNCT zp38w953f1h 10 1 for for ADP zp38w953f1h 10 2 imputation imputation NOUN zp38w953f1h 10 3 of of ADP zp38w953f1h 10 4 missing miss VERB zp38w953f1h 10 5 genotype genotype NOUN zp38w953f1h 10 6 data datum NOUN zp38w953f1h 10 7 , , PUNCT zp38w953f1h 10 8 we we PRON zp38w953f1h 10 9 design design VERB zp38w953f1h 10 10 an an DET zp38w953f1h 10 11 accurate accurate ADJ zp38w953f1h 10 12 , , PUNCT zp38w953f1h 10 13 fast fast ADJ zp38w953f1h 10 14 , , PUNCT zp38w953f1h 10 15 and and CCONJ zp38w953f1h 10 16 lightweight lightweight ADJ zp38w953f1h 10 17 algorithm algorithm PROPN zp38w953f1h 10 18 for for ADP zp38w953f1h 10 19 both both DET zp38w953f1h 10 20 model model NOUN zp38w953f1h 10 21 ( ( PUNCT zp38w953f1h 10 22 with with ADP zp38w953f1h 10 23 a a DET zp38w953f1h 10 24 reference reference NOUN zp38w953f1h 10 25 genotype genotype NOUN zp38w953f1h 10 26 panel panel NOUN zp38w953f1h 10 27 ) ) PUNCT zp38w953f1h 10 28 and and CCONJ zp38w953f1h 10 29 non non ADJ zp38w953f1h 10 30 - - ADJ zp38w953f1h 10 31 model model ADJ zp38w953f1h 10 32 ( ( PUNCT zp38w953f1h 10 33 without without ADP zp38w953f1h 10 34 a a DET zp38w953f1h 10 35 reference reference NOUN zp38w953f1h 10 36 genotype genotype NOUN zp38w953f1h 10 37 panel panel NOUN zp38w953f1h 10 38 ) ) PUNCT zp38w953f1h 10 39 organisms organism NOUN zp38w953f1h 10 40 . . PUNCT zp38w953f1h 11 1 to to PART zp38w953f1h 11 2 correct correct VERB zp38w953f1h 11 3 erroneous erroneous ADJ zp38w953f1h 11 4 long long ADJ zp38w953f1h 11 5 reads read NOUN zp38w953f1h 11 6 generated generate VERB zp38w953f1h 11 7 by by ADP zp38w953f1h 11 8 emerging emerge VERB zp38w953f1h 11 9 sequencing sequence VERB zp38w953f1h 11 10 techniques technique NOUN zp38w953f1h 11 11 , , PUNCT zp38w953f1h 11 12 we we PRON zp38w953f1h 11 13 formulate formulate VERB zp38w953f1h 11 14 a a DET zp38w953f1h 11 15 hybrid hybrid ADJ zp38w953f1h 11 16 correction correction NOUN zp38w953f1h 11 17 algorithm algorithm X zp38w953f1h 11 18 that that PRON zp38w953f1h 11 19 determines determine VERB zp38w953f1h 11 20 a a DET zp38w953f1h 11 21 correction correction NOUN zp38w953f1h 11 22 policy policy NOUN zp38w953f1h 11 23 based base VERB zp38w953f1h 11 24 on on ADP zp38w953f1h 11 25 an an DET zp38w953f1h 11 26 optimal optimal ADJ zp38w953f1h 11 27 combination combination NOUN zp38w953f1h 11 28 of of ADP zp38w953f1h 11 29 base base ADJ zp38w953f1h 11 30 quality quality NOUN zp38w953f1h 11 31 and and CCONJ zp38w953f1h 11 32 similarity similarity NOUN zp38w953f1h 11 33 of of ADP zp38w953f1h 11 34 aligned align VERB zp38w953f1h 11 35 short short ADJ zp38w953f1h 11 36 reads read NOUN zp38w953f1h 11 37 . . PUNCT zp38w953f1h 12 1 we we PRON zp38w953f1h 12 2 extend extend VERB zp38w953f1h 12 3 the the DET zp38w953f1h 12 4 core core NOUN zp38w953f1h 12 5 algorithm algorithm NOUN zp38w953f1h 12 6 by by ADP zp38w953f1h 12 7 proposing propose VERB zp38w953f1h 12 8 an an DET zp38w953f1h 12 9 iterative iterative ADJ zp38w953f1h 12 10 learning learning NOUN zp38w953f1h 12 11 paradigm paradigm NOUN zp38w953f1h 12 12 that that PRON zp38w953f1h 12 13 further far ADV zp38w953f1h 12 14 improves improve VERB zp38w953f1h 12 15 its its PRON zp38w953f1h 12 16 performance.our performance.our PRON zp38w953f1h 12 17 proposed propose VERB zp38w953f1h 12 18 data datum NOUN zp38w953f1h 12 19 analysis analysis NOUN zp38w953f1h 12 20 framework framework NOUN zp38w953f1h 12 21 is be AUX zp38w953f1h 12 22 accessible accessible ADJ zp38w953f1h 12 23 to to ADP zp38w953f1h 12 24 the the DET zp38w953f1h 12 25 scientific scientific ADJ zp38w953f1h 12 26 community community NOUN zp38w953f1h 12 27 and and CCONJ zp38w953f1h 12 28 has have AUX zp38w953f1h 12 29 been be AUX zp38w953f1h 12 30 used use VERB zp38w953f1h 12 31 to to PART zp38w953f1h 12 32 study study VERB zp38w953f1h 12 33 the the DET zp38w953f1h 12 34 genomes genome NOUN zp38w953f1h 12 35 of of ADP zp38w953f1h 12 36 important important ADJ zp38w953f1h 12 37 plant plant NOUN zp38w953f1h 12 38 species specie NOUN zp38w953f1h 12 39 and and CCONJ zp38w953f1h 12 40 malaria malaria PROPN zp38w953f1h 12 41 vector vector NOUN zp38w953f1h 12 42 mosquitoes mosquito NOUN zp38w953f1h 12 43 . . PUNCT zp38w953f1h 13 1 the the DET zp38w953f1h 13 2 predictive predictive ADJ zp38w953f1h 13 3 models model NOUN zp38w953f1h 13 4 exhibit exhibit VERB zp38w953f1h 13 5 high high ADJ zp38w953f1h 13 6 accuracy accuracy NOUN zp38w953f1h 13 7 in in ADP zp38w953f1h 13 8 determining determine VERB zp38w953f1h 13 9 optimal optimal ADJ zp38w953f1h 13 10 parameters parameter NOUN zp38w953f1h 13 11 of of ADP zp38w953f1h 13 12 operation operation NOUN zp38w953f1h 13 13 on on ADP zp38w953f1h 13 14 commercial commercial ADJ zp38w953f1h 13 15 cloud cloud NOUN zp38w953f1h 13 16 services service NOUN zp38w953f1h 13 17 like like ADP zp38w953f1h 13 18 amazon amazon PROPN zp38w953f1h 13 19 ec2 ec2 PROPN zp38w953f1h 13 20 and and CCONJ zp38w953f1h 13 21 microsoft microsoft PROPN zp38w953f1h 13 22 azure azure PROPN zp38w953f1h 13 23 . . PUNCT zp38w953f1h 14 1 finally finally ADV zp38w953f1h 14 2 , , PUNCT zp38w953f1h 14 3 the the DET zp38w953f1h 14 4 imputation imputation NOUN zp38w953f1h 14 5 and and CCONJ zp38w953f1h 14 6 error error NOUN zp38w953f1h 14 7 correction correction NOUN zp38w953f1h 14 8 algorithms algorithm NOUN zp38w953f1h 14 9 outperform outperform VERB zp38w953f1h 14 10 state state NOUN zp38w953f1h 14 11 - - PUNCT zp38w953f1h 14 12 of of ADP zp38w953f1h 14 13 - - PUNCT zp38w953f1h 14 14 the the DET zp38w953f1h 14 15 - - PUNCT zp38w953f1h 14 16 art art NOUN zp38w953f1h 14 17 alternatives alternative NOUN zp38w953f1h 14 18 when when SCONJ zp38w953f1h 14 19 tested test VERB zp38w953f1h 14 20 on on ADP zp38w953f1h 14 21 real real ADJ zp38w953f1h 14 22 data data NOUN zp38w953f1h 14 23 sets set NOUN zp38w953f1h 14 24 of of ADP zp38w953f1h 14 25 plants plant NOUN zp38w953f1h 14 26 , , PUNCT zp38w953f1h 14 27 malarial malarial ADJ zp38w953f1h 14 28 mosquitoes mosquito NOUN zp38w953f1h 14 29 , , PUNCT zp38w953f1h 14 30 and and CCONJ zp38w953f1h 14 31 humans human NOUN zp38w953f1h 14 32 . . PUNCT zp38w953f1h 15 1 hence hence ADV zp38w953f1h 15 2 , , PUNCT zp38w953f1h 15 3 in in ADP zp38w953f1h 15 4 this this DET zp38w953f1h 15 5 thesis thesis NOUN zp38w953f1h 15 6 , , PUNCT zp38w953f1h 15 7 we we PRON zp38w953f1h 15 8 present present VERB zp38w953f1h 15 9 novel novel ADJ zp38w953f1h 15 10 solutions solution NOUN zp38w953f1h 15 11 to to PART zp38w953f1h 15 12 expedite expedite VERB zp38w953f1h 15 13 data data NOUN zp38w953f1h 15 14 - - PUNCT zp38w953f1h 15 15 parallel parallel NOUN zp38w953f1h 15 16 genomic genomic NOUN zp38w953f1h 15 17 applications application NOUN zp38w953f1h 15 18 while while SCONJ zp38w953f1h 15 19 optimizing optimize VERB zp38w953f1h 15 20 cloud cloud NOUN zp38w953f1h 15 21 and and CCONJ zp38w953f1h 15 22 cluster cluster NOUN zp38w953f1h 15 23 - - PUNCT zp38w953f1h 15 24 based base VERB zp38w953f1h 15 25 resource resource NOUN zp38w953f1h 15 26 utilization utilization NOUN zp38w953f1h 15 27 . . PUNCT zp38w953f1h 16 1 we we PRON zp38w953f1h 16 2 also also ADV zp38w953f1h 16 3 design design VERB zp38w953f1h 16 4 novel novel NOUN zp38w953f1h 16 5 , , PUNCT zp38w953f1h 16 6 accurate accurate ADJ zp38w953f1h 16 7 , , PUNCT zp38w953f1h 16 8 and and CCONJ zp38w953f1h 16 9 efficient efficient ADJ zp38w953f1h 16 10 algorithms algorithm NOUN zp38w953f1h 16 11 to to PART zp38w953f1h 16 12 impute impute VERB zp38w953f1h 16 13 missing miss VERB zp38w953f1h 16 14 data datum NOUN zp38w953f1h 16 15 and and CCONJ zp38w953f1h 16 16 correct correct VERB zp38w953f1h 16 17 erroneous erroneous ADJ zp38w953f1h 16 18 data datum NOUN zp38w953f1h 16 19 in in ADP zp38w953f1h 16 20 emerging emerge VERB zp38w953f1h 16 21 genomic genomic ADJ zp38w953f1h 16 22 applications application NOUN zp38w953f1h 16 23 . . PUNCT