id author title date pages extension mime words sentences flesch summary cache txt work_47lp4am6gndwhjbxhphpyiw2zy Liunian Harold Li Efficient Contextual Representation Learning With Continuous Outputs 2019 14 .pdf application/pdf 8847 869 65 these language-model-based encoders are difficult to train due to their large parameter layer perfectly serve our desire to decouple learning contexts and words and devote most computational resources to the contextual encoder. reduce the complexity of the output layer in language models, such as the adaptive softmax, and pre-trained contextual representations to downstream tasks: 1) feature-based and 2) fine-tuning. potential drawback of these subword-level language models, however, is that they produce representations for fragments of words. The continuous output layer has a reduced arithmetic complexity and trainable parameter size. softmax layer in ELMo trained on the One Billion Communication cost To train large neural network models, using multiple GPUs almost becomes output layer, on the other hand, incurs little communication cost across GPUs. 3.2 Open-Vocabulary Training With the continuous output layer, we can conduct training on an arbitrary sequence of words, as the pre-trained word embedding affects the performance of the model. Pre-trained CNN layer as word embedding ./cache/work_47lp4am6gndwhjbxhphpyiw2zy.pdf ./txt/work_47lp4am6gndwhjbxhphpyiw2zy.txt