id sid tid token lemma pos mg74qj7616h 1 1 machine machine NOUN mg74qj7616h 1 2 translation translation NOUN mg74qj7616h 1 3 , , PUNCT mg74qj7616h 1 4 the the DET mg74qj7616h 1 5 subfield subfield NOUN mg74qj7616h 1 6 of of ADP mg74qj7616h 1 7 computer computer NOUN mg74qj7616h 1 8 science science NOUN mg74qj7616h 1 9 that that PRON mg74qj7616h 1 10 focuses focus VERB mg74qj7616h 1 11 on on ADP mg74qj7616h 1 12 translating translate VERB mg74qj7616h 1 13 between between ADP mg74qj7616h 1 14 two two NUM mg74qj7616h 1 15 human human ADJ mg74qj7616h 1 16 languages language NOUN mg74qj7616h 1 17 , , PUNCT mg74qj7616h 1 18 has have AUX mg74qj7616h 1 19 greatly greatly ADV mg74qj7616h 1 20 benefited benefite VERB mg74qj7616h 1 21 from from ADP mg74qj7616h 1 22 neural neural ADJ mg74qj7616h 1 23 networks network NOUN mg74qj7616h 1 24 . . PUNCT mg74qj7616h 2 1 however however ADV mg74qj7616h 2 2 , , PUNCT mg74qj7616h 2 3 these these DET mg74qj7616h 2 4 neural neural ADJ mg74qj7616h 2 5 machine machine NOUN mg74qj7616h 2 6 translation translation NOUN mg74qj7616h 2 7 systems system NOUN mg74qj7616h 2 8 have have VERB mg74qj7616h 2 9 complicated complicate VERB mg74qj7616h 2 10 architectures architecture NOUN mg74qj7616h 2 11 with with ADP mg74qj7616h 2 12 many many ADJ mg74qj7616h 2 13 hyperparameters hyperparameter NOUN mg74qj7616h 2 14 that that PRON mg74qj7616h 2 15 need need VERB mg74qj7616h 2 16 to to PART mg74qj7616h 2 17 be be AUX mg74qj7616h 2 18 manually manually ADV mg74qj7616h 2 19 chosen choose VERB mg74qj7616h 2 20 . . PUNCT mg74qj7616h 3 1 frequently frequently ADV mg74qj7616h 3 2 , , PUNCT mg74qj7616h 3 3 these these PRON mg74qj7616h 3 4 are be AUX mg74qj7616h 3 5 selected select VERB mg74qj7616h 3 6 either either CCONJ mg74qj7616h 3 7 through through ADP mg74qj7616h 3 8 a a DET mg74qj7616h 3 9 grid grid NOUN mg74qj7616h 3 10 search search NOUN mg74qj7616h 3 11 over over ADP mg74qj7616h 3 12 values value NOUN mg74qj7616h 3 13 , , PUNCT mg74qj7616h 3 14 or or CCONJ mg74qj7616h 3 15 by by ADP mg74qj7616h 3 16 using use VERB mg74qj7616h 3 17 values value NOUN mg74qj7616h 3 18 commonplace commonplace ADJ mg74qj7616h 3 19 in in ADP mg74qj7616h 3 20 the the DET mg74qj7616h 3 21 literature literature NOUN mg74qj7616h 3 22 . . PUNCT mg74qj7616h 4 1 however however ADV mg74qj7616h 4 2 , , PUNCT mg74qj7616h 4 3 these these PRON mg74qj7616h 4 4 are be AUX mg74qj7616h 4 5 not not PART mg74qj7616h 4 6 theoretically theoretically ADV mg74qj7616h 4 7 justified justified PROPN mg74qj7616h 4 8 and and CCONJ mg74qj7616h 4 9 the the DET mg74qj7616h 4 10 same same ADJ mg74qj7616h 4 11 values value NOUN mg74qj7616h 4 12 are be AUX mg74qj7616h 4 13 not not PART mg74qj7616h 4 14 optimal optimal ADJ mg74qj7616h 4 15 for for ADP mg74qj7616h 4 16 all all DET mg74qj7616h 4 17 language language NOUN mg74qj7616h 4 18 pairs pair NOUN mg74qj7616h 4 19 and and CCONJ mg74qj7616h 4 20 datasets.fortunately datasets.fortunately ADV mg74qj7616h 4 21 , , PUNCT mg74qj7616h 4 22 the the DET mg74qj7616h 4 23 innate innate ADJ mg74qj7616h 4 24 structure structure NOUN mg74qj7616h 4 25 of of ADP mg74qj7616h 4 26 the the DET mg74qj7616h 4 27 problem problem NOUN mg74qj7616h 4 28 allows allow VERB mg74qj7616h 4 29 for for ADP mg74qj7616h 4 30 optimization optimization NOUN mg74qj7616h 4 31 of of ADP mg74qj7616h 4 32 these these DET mg74qj7616h 4 33 hyperparameters hyperparameter NOUN mg74qj7616h 4 34 during during ADP mg74qj7616h 4 35 training training NOUN mg74qj7616h 4 36 . . PUNCT mg74qj7616h 5 1 traditionally traditionally ADV mg74qj7616h 5 2 , , PUNCT mg74qj7616h 5 3 the the DET mg74qj7616h 5 4 hyperparameters hyperparameter NOUN mg74qj7616h 5 5 of of ADP mg74qj7616h 5 6 a a DET mg74qj7616h 5 7 system system NOUN mg74qj7616h 5 8 are be AUX mg74qj7616h 5 9 chosen choose VERB mg74qj7616h 5 10 and and CCONJ mg74qj7616h 5 11 then then ADV mg74qj7616h 5 12 a a DET mg74qj7616h 5 13 learning learning NOUN mg74qj7616h 5 14 algorithm algorithm PROPN mg74qj7616h 5 15 optimizes optimize VERB mg74qj7616h 5 16 all all PRON mg74qj7616h 5 17 of of ADP mg74qj7616h 5 18 the the DET mg74qj7616h 5 19 parameters parameter NOUN mg74qj7616h 5 20 within within ADP mg74qj7616h 5 21 the the DET mg74qj7616h 5 22 model model NOUN mg74qj7616h 5 23 . . PUNCT mg74qj7616h 6 1 in in ADP mg74qj7616h 6 2 this this DET mg74qj7616h 6 3 work work NOUN mg74qj7616h 6 4 , , PUNCT mg74qj7616h 6 5 i i PRON mg74qj7616h 6 6 propose propose VERB mg74qj7616h 6 7 three three NUM mg74qj7616h 6 8 methods method NOUN mg74qj7616h 6 9 to to PART mg74qj7616h 6 10 learn learn VERB mg74qj7616h 6 11 the the DET mg74qj7616h 6 12 optimal optimal ADJ mg74qj7616h 6 13 hyperparameters hyperparameter NOUN mg74qj7616h 6 14 during during ADP mg74qj7616h 6 15 the the DET mg74qj7616h 6 16 training training NOUN mg74qj7616h 6 17 of of ADP mg74qj7616h 6 18 the the DET mg74qj7616h 6 19 model model NOUN mg74qj7616h 6 20 , , PUNCT mg74qj7616h 6 21 allowing allow VERB mg74qj7616h 6 22 for for ADP mg74qj7616h 6 23 one one NUM mg74qj7616h 6 24 step step NOUN mg74qj7616h 6 25 instead instead ADV mg74qj7616h 6 26 of of ADP mg74qj7616h 6 27 two two NUM mg74qj7616h 6 28 . . PUNCT mg74qj7616h 7 1 first first ADV mg74qj7616h 7 2 , , PUNCT mg74qj7616h 7 3 i i PRON mg74qj7616h 7 4 propose propose VERB mg74qj7616h 7 5 using use VERB mg74qj7616h 7 6 group group NOUN mg74qj7616h 7 7 regularizers regularizer NOUN mg74qj7616h 7 8 to to PART mg74qj7616h 7 9 learn learn VERB mg74qj7616h 7 10 the the DET mg74qj7616h 7 11 number number NOUN mg74qj7616h 7 12 , , PUNCT mg74qj7616h 7 13 and and CCONJ mg74qj7616h 7 14 size size NOUN mg74qj7616h 7 15 of of ADP mg74qj7616h 7 16 , , PUNCT mg74qj7616h 7 17 the the DET mg74qj7616h 7 18 hidden hide VERB mg74qj7616h 7 19 neural neural ADJ mg74qj7616h 7 20 network network NOUN mg74qj7616h 7 21 layers layer NOUN mg74qj7616h 7 22 . . PUNCT mg74qj7616h 8 1 second second ADV mg74qj7616h 8 2 , , PUNCT mg74qj7616h 8 3 i i PRON mg74qj7616h 8 4 demonstrate demonstrate VERB mg74qj7616h 8 5 how how SCONJ mg74qj7616h 8 6 to to PART mg74qj7616h 8 7 use use VERB mg74qj7616h 8 8 a a DET mg74qj7616h 8 9 perceptron perceptron NOUN mg74qj7616h 8 10 - - PUNCT mg74qj7616h 8 11 like like ADJ mg74qj7616h 8 12 tuning tuning NOUN mg74qj7616h 8 13 method method NOUN mg74qj7616h 8 14 to to PART mg74qj7616h 8 15 solve solve VERB mg74qj7616h 8 16 known known ADJ mg74qj7616h 8 17 problems problem NOUN mg74qj7616h 8 18 of of ADP mg74qj7616h 8 19 undertranslation undertranslation NOUN mg74qj7616h 8 20 and and CCONJ mg74qj7616h 8 21 label label NOUN mg74qj7616h 8 22 bias bias NOUN mg74qj7616h 8 23 . . PUNCT mg74qj7616h 9 1 finally finally ADV mg74qj7616h 9 2 , , PUNCT mg74qj7616h 9 3 i i PRON mg74qj7616h 9 4 propose propose VERB mg74qj7616h 9 5 an an DET mg74qj7616h 9 6 expectation expectation NOUN mg74qj7616h 9 7 - - PUNCT mg74qj7616h 9 8 maximization maximization NOUN mg74qj7616h 9 9 based base VERB mg74qj7616h 9 10 method method NOUN mg74qj7616h 9 11 to to PART mg74qj7616h 9 12 learn learn VERB mg74qj7616h 9 13 the the DET mg74qj7616h 9 14 optimal optimal ADJ mg74qj7616h 9 15 vocabulary vocabulary ADJ mg74qj7616h 9 16 size size NOUN mg74qj7616h 9 17 and and CCONJ mg74qj7616h 9 18 granularity granularity NOUN mg74qj7616h 9 19 . . PUNCT mg74qj7616h 10 1 using use VERB mg74qj7616h 10 2 various various ADJ mg74qj7616h 10 3 techniques technique NOUN mg74qj7616h 10 4 from from ADP mg74qj7616h 10 5 machine machine NOUN mg74qj7616h 10 6 learning learning PROPN mg74qj7616h 10 7 and and CCONJ mg74qj7616h 10 8 numerical numerical PROPN mg74qj7616h 10 9 optimization optimization NOUN mg74qj7616h 10 10 , , PUNCT mg74qj7616h 10 11 this this DET mg74qj7616h 10 12 dissertation dissertation NOUN mg74qj7616h 10 13 covers cover VERB mg74qj7616h 10 14 how how SCONJ mg74qj7616h 10 15 to to PART mg74qj7616h 10 16 learn learn VERB mg74qj7616h 10 17 hyperparameters hyperparameter NOUN mg74qj7616h 10 18 of of ADP mg74qj7616h 10 19 a a DET mg74qj7616h 10 20 neural neural ADJ mg74qj7616h 10 21 machine machine NOUN mg74qj7616h 10 22 translation translation NOUN mg74qj7616h 10 23 system system NOUN mg74qj7616h 10 24 while while SCONJ mg74qj7616h 10 25 training train VERB mg74qj7616h 10 26 the the DET mg74qj7616h 10 27 model model NOUN mg74qj7616h 10 28 itself itself PRON mg74qj7616h 10 29 . . PUNCT